To evaluate the true impact of marketing campaigns, marketers rely on attribution measurement to understand customers’ journeys and decisions. Accurate attribution of campaigns’ influence on conversions allows marketers to optimize marketing spend and strategy to maximize return on investment (ROI). Without this, suboptimal investments and lost revenue can become an issue. In this post, we discuss how we took a hybrid approach of leveraging bottom-up and top-down modeling approaches in order to build accurate data-driven attribution that improves campaign performance and ROI.

Context

There are two general approaches to measuring attribution: rule-based attribution and data-driven attribution.

Traditional marketing attribution uses rule-based attribution (RBA) approaches (e.g., first-touch, last-touch, time decay) which assign conversion credit based on simple, predetermined rules. RBA methods are easy to understand and implement and can incorporate domain-specific logic. However, RBA is often based on limited assumptions and is more relevant for B2C customers. They also overlook the stages of the customer journey, often overvaluing the bottom-of-funnel touchpoints and undervaluing early top-of-funnel touchpoints creating a biased channel-level mix and limited exposure for B2B marketers.

A more comprehensive suite of approaches falls under data-driven attribution (DDA) modeling. DDA models utilize machine learning and statistical techniques to allocate conversion credit. Methodologies such as Multi-Touch Attribution (MTA) and Marketing Mix Modeling (MMM), consider a broader range of factors and offer a more balanced view of the customer journey, from initial awareness to final conversion. MMM considers a top-down approach where channel touchpoints are modeled and account for seasonal and macroeconomic factors. MTA considers a bottom-up approach where member-level touchpoints are modeled and the touchpoint journey is captured.

At LinkedIn, we leverage the complementary value of both MMM and MTA approaches and have developed a unified system bridging the two methodologies in our attribution stack. We have successfully deployed the system for our internal marketing (i.e., marketing for LinkedIn’s products), and will leverage this methodology for advertisers on the LinkedIn Marketing Solutions platform.

How does this model work?

Attention-based modeling

High-level diagram showing the end-to-end components of our attribution modeling framework
Figure 1: High-level diagram showing the end-to-end components of our attribution modeling framework

Overview

We considered different algorithmic approaches when designing our modeled attribution platform. Our platform needed to account for both member-level features and specific touchpoint sequences. Transformer-based attention models are well suited for this task since they retain rich information about the buyer journey and also enable us to incorporate additional features about the member, their company, and campaign information. 

Overall, the model is structured as a binary classification task to predict whether a buyer journey will result in a conversion. We built a model P(C | EM, Ec, S) where:

  • C indicates the binary conversion outcome
  • EM is a member representation vector of length DM sourced from an external model
  • EC is a (member’s) company representation vector of length DC sourced from an external model
  • S is a sequence of marketing touchpoints of length T

From this trained model, we output the aggregated attention weight matrix from the touchpoint sequence S. The attention weights are normalized such that they add up to 1.0 and the normalized weights are interpreted as percent contribution of that touchpoint to the final conversion event. The weights are then calibrated using controlled marketing experiments and media-mixed models to provide our marketers with the final attribution values.

Path construction

The touchpoint sequence S is a time series of touchpoints pt ∈ {p0, …, pK+1}. indexed by time t. K is the distinct number of possible touches/channels with 0 reserved for padding. The sequence S is limited to the last N touches such that paths with a count of T > N are truncated. Each touchpoint is represented with a representation ET. Touchpoints are context-specific and could represent marketing activities or other steps in a defined journey.

Each day, our Spark data ETL flows process member events into marketing paths and augment the necessary features for modeling. We store these paths where they serve a dual purpose as model inputs and for use by marketing teams in post-modeling reporting. For model inference, we only define paths with respect to conversion events. The conversion event is the unique path identifier and anchor for all preceding touchpoints. During model training, we also generate a sample of non-converting paths with respect to the current date to provide negative labels to our model. We utilize immutable conversion and touchpoint identifiers that allow downstream partner teams to easily connect attribution results to other data for further analysis and reporting.

Positional representations

Attention models do not inherently model the order of touches in the paths so additional encodings are needed to capture the relative positions of each touch. Sinusoidal positional encodings are commonly used for temporal ordering but, on their own, are not sufficient for our use case. Marketing engagements are generated with irregular patterns and not across fixed intervals. For example, the time between an impression and click may be a few seconds while the time between two impressions could be days. We capture these irregular differences by incorporating positional representation ED for discrete daily intervals between touchpoints. These positional representation are learnable parameters of the model and allow for it to capture seasonal effects which are frequently observed in our data.

Further, since multiple activities can still occur on the same day, we also add the sinusoidal positional encodings to preserve intra-day ordering. The combination of positional representations and sinusoidal positional encodings allows for both relative and absolute differentiation by the model. We adopt a small modification to the original positional encodings using Time Absolute Position Encodings (tAPE) as proposed by Foumani, et al. which scales the encodings to improve performance in lower dimensional representations.

Touchpoint representations

We represent each touchpoint with a feature vector ET via a learned representation lookup for each discrete event of (touchpointType, action). An additional feature vector, EMCID, captures different aspects of the marketing campaign such as its metadata or content. We obtain these by extracting a text description of the campaign and generating an embedded representation using an internal LLM fine-tuned on a LinkedIn corpus. This approach allows flexibility in combining structured and unstructured data consistently across domains. We bring these concepts together as concat( EMCID, ET) + ED + tAPE where EMCID is the campaign representation.

Entity representations

Within the model, we capture features related to the member and their company. A similar encoding process is used for members & companies where features such as platform actions, titles, skills, and company relationships are used to generate derived representations. We use these representations in the model to help control for differences in baseline interactions. For example, we may have (member, product) pairs with higher affinities, increasing the likelihood of conversion outcomes.

Architecture

These elements are implemented in a single neural network architecture as shown in Figure 1 above. The positional representations are combined with the sequential touchpoint data generated by members. These sequences are fed through a self-attention module. We concatenate member and company representations and feed these through a dense layer to create a representation of the acting member. The member’s representation and the output of the attention layers are combined and fed through a classification head for the learning task. We train the model using binary cross-entropy loss on the conversion label of the path. During model inference, we utilize the combined weight matrix from the attention layers and normalize these scores to add to 1.0 which form the final attribution weights.

Paid media impression imputation

LinkedIn’s outbound marketing channels are grouped into two in terms of data granularity, owned channels and paid channels. LinkedIn’s owned channels include outbound email and using LinkedIn’s advertising platform to market our products & services to members. Within our owned channels, we can collect user-level events, such as impressions, clicks, sends, or opens. For paid media channels, due to privacy restrictions, we only collect proxy click events from page land events on LinkedIn microsites. Not only do we miss linking some of these click events to member identities, but we are critically missing all impression data at the user level for Paid Media channels. Furthermore, these page land events are biased toward click-oriented channels such as Paid Search. This lack of member-level impression data for paid media channels can bias attribution modeling. 

While member-level data is not available for paid media channels, we can still obtain daily aggregate campaign reports. We leverage this aggregated data to mitigate bias by imputing paid media impressions as shown in Figure 2 below. We probabilistically distribute paid media impressions among members on a daily basis at the campaign level. This is performed in two steps. First, we attach a certain number of impressions to existing click events, because clicks necessarily have prior impressions. Second, we look at how impressions are distributed across paths without any reference to clicks for each paid media channel. In both cases, we use owned channels as proxies to get prior information on these distributions. 

Downsampling

Our owned channels have significantly more touchpoints compared to paid media in member paths due to signal bias. The paths that go into attribution modeling are of fixed length, therefore LinkedIn touchpoints may saturate the paths and reduce the representation of paid media touchpoints in converting paths. To make the representation of channels more balanced across paths, we group impressions from certain channels in a path as a single touchpoint within a session, as illustrated in the figure below. We retain the downsampling links for post-model processing to reallocate credit across the grouped events.

We impute missing paid media impressions into paths probabilistically. We also group impressions from owned channels.
Figure 2. We impute missing paid media impressions into paths probabilistically. We also group impressions from owned channels.

Post-modeling calibration

In contrast to the modeled multi-touch attribution (MTA) using member-level data from performance-marketing (online) channels (display, search, video, etc.) described above, Marketing Mix Model (MMM) uses (weekly or daily) aggregated data from macro-economic factors, and offline channels (TV, direct mail, etc.), in addition to performance-marketing channels. Performance-marketing channels being modeled in two different ways separately, in MMM and MTA, can lead to inconsistencies, because there is no constraint to enforce these top-down (MMM) and bottom-up (MTA) modeling results to match. 

To avoid inconsistent results, we incorporate an additional post-modeling calibration step where attribution estimates are further aligned with MMM models. The post-modeling calibration adjusts for external information to consider marketing incremental effects such as macroeconomic factors. This retains the MTA-level allocations across campaigns while scaling the final output to the total marketing contribution. In particular, we scale MTA outputs at the channel level such that the total number of attributed conversions by MTA equals that by MMM in each channel in each quarter. To do this, we multiply the attribution of each touchpoint in a channel by the ratio of total number of attributed conversions by MMM to that by MTA in each channel. This does not change the relative relationship between the attribution of touchpoints from campaigns within that channel group predicted by MTA.

Offline and online evaluation

We define channel lift as the modeled % change in the conversion probability when all touchpoints from a channel (or a group of campaigns) are removed from paths. Channel lift is the most comparable metric to A/B holdout test results empirically measuring the incrementality of a given marketing channel.

We model channel lift by creating a new counterfactual path, in which we remove the touchpoints from the corresponding channel. Figure 3 below shows a schematic for lift calculations for the Search channel in a path with four touchpoints: [Email, LinkedIn, Search, Search]. We compute the conversion probability of the following two paths:

  1. Original path: [Email, LinkedIn, Search, Search]. This models a conversion path of a member in the treatment group in the incrementality test.
  2. Counterfactual path: [Email, LinkedIn]. This models a conversion path of a member in the control (or “holdout”) group in the incrementality test.
Path lift computation
Figure 3: Path lift computation

We calculate lift as a ratio of conversion probabilities of the original path to that of the counterfactual path: P(original path) / P(counterfactual path) - 1. If the baseline path had an 80% conversion probability while the counterfactual path had a 50% conversion probability, the estimated lift for the Search channel would be 0.80/0.50 - 1 = 0.6 (60%). We use a floor of 0.0 in our lift calculations. Similarly, we model the lift of a group of campaigns by creating modified paths that omit all touchpoints with these campaigns, then we perform the same calculation to calculate the lift.

As outlined in our current approach, leaving out one or more touchpoints in the path is not an optimal approach to measuring contributions and lift. The complicating issue is that each event in a sequence is not independent of prior events. A simplified example is that the click of an advertisement cannot occur before an individual sees it. Generation of a real-world sequence that only has click events would therefore not be possible. Additional biases in paths exist if we consider how a user may interact with a website. We generally do not expect a user to jump around a site at random and would expect some actions to likely follow others. We can generalize this to a Markov transition matrix where for any two touchpoints p, some will always be exactly 0, others near 0, and the remaining some value > 0. 

In future work, we plan to investigate additional methodologies for creating synthetic counterfactuals by using data across paths for similar members. For example, suppose two similar members have similar paths up to a point in time. In that case, we can truncate one member’s path up to the touchpoint being measured and then substitute their outcome with the other observed counterfactual. 

Application

Our marketing AI team developed this attribution platform in partnership with our internal LinkedIn marketing teams. LinkedIn marketing had historically relied on rule-based attribution (RBA) based on last-click, where full credit for a conversion point was given to the last-click event. This over-indexed the credit towards low funnel channels that convert demand, such as Search or Email. The ideal state for a business to grow is with a full-funnel investment portfolio. However, last-click attribution understates the value of upper and mid-funnel channels which devoids marketers with the ability to see their performance or how to optimize it.

Our improved multi-touch, data-driven attribution methodology along with the inclusion of probabilistic touchpoints for Paid Media will allow marketers to get visibility into our full funnel performance and optimize our budget allocation to maximize our ROI. While still under quality testing, business is already starting to see promising results that point towards the value Modeled Attribution will bring versus last click.

As an example, the business observed the performance of both models for Upper and Mid Funnel campaigns which are found in Video Ads, Digital Display and Social Media. When comparing both models for non-search channels, Modeled Attribution was able to recognize and deliver credit whereas Last Click remained flat. This is due to the model's ability to stitch impressions from these campaigns in the user journey which RBA models are not able to do. Initial results show a 150x increase in credit found in Modeled Attribution which paces well with Marketing’s spending increase during this time frame (Figure 4).

Performance of upper and mid funnel campaigns over the measurement period demonstrating how modeled attribution better captures the effects of increased spend
Figure 4: Performance of upper and mid funnel campaigns over the measurement period demonstrating how modeled attribution better captures the effects of increased spend

Overall, the business is estimated to deliver a 5% lift in marketing-driven revenue due to in-quarter optimizations enabled by the use of Modeled Attribution for weekly performance reporting in FY25.

Discussions with other teams at LinkedIn have shown that similar attribution cases exist across product and business domains. Broadly speaking, given an observed outcome, how do we fractionally attribute impact for that outcome across multiple touchpoints? To support other use cases, we have made our attribution library internally available for other teams to use. We look forward to the expanded collaboration as it enables further cross-domain improvements and learnings.

Conclusion

At LinkedIn, the transition from rule-based to data-driven marketing attribution has provided us with valuable new insights. The hybrid approach of leveraging both bottom-up and top-down modeling approaches – incorporating signals from MMM, holdout experiments, and multi-touch, customer-centric attribution modeling – allows us to adapt to data availability limitations while also improving the overall quality of our results. While this approach is being adopted by our internal Linkedin marketing team, we are also enhancing the LMS measurement platform to deliver comprehensive funnel insights for LinkedIn advertisers by leveraging this data-driven attribution algorithm. Our team has learned a lot as we’ve built and tested this platform with our partners. Future directions of research include: how can we improve the causal robustness of the model; how can we improve the simulated effect of a campaign to align with experimental testing more closely; and how do we more closely bring together MTA, MMM, and experimental signals under one model.

Acknowledgments

Thank you to Anu Bedi, Zaid Nasir, Paul Lui, Chinmay Kothari, Thi Phuong Lan Nguyen, Laura Wen, Iris Kim, Marco Delgado Mora, Jenee Shah, Deepak Venkateshappa, Licurgo de Almeida, Flora Chen, Shruti Sharma, Lisa Qian, Saylee Raskar, Yajun Wang, Mark Dietz, Sean Peng, Sayali Sonawane, Saket Kumar, David Shan, Bangchuan Liu, Rashmi Jain, Manoj Thakur, Aarthi Jayaram, Suresh Rayasam Venkatasubbaiah