Solved – Multi Channel Attribution Modelling: Using A Simple Probabilistic Model

machine learningmarketingmultivariate analysispredictive-modelsprobability

Please see Chapter 3.2 from this article:
http://www.turn.com.akadns.net/sites/default/files/whitepapers/TURN_Tech_WP_Data-driven_Multi-touch_Attribution_Models.pdf

Here, a method is described by which the influence of online-marketing-channels/campaigns (for example: "Display" as a channel for Display-Ads, "SEA" as a channel for Search-Engine-Ads etc.) on a binary criterion (for example: 1 = positiv User = User buys a product in online-store vs. 0 = negative User = User does does not buy a product in online-store) can be calculated. The described method is described as more easy and simple than conducting a logistic regression.

With the first equation, you can calculate for every channel the share of positive users (=users who had contact to this channel AND bought the product) on all users which had contact to this channel:

enter image description here

Example: 1000 Users had contact to the channel "Display" and 400 of them bought the product. Hence, the probability for this channel is 0.4 (=400/1000 = 400/(400 + 600))

For considering overlapping between channels, you use equation two, which includes the second-order interaction term:

enter image description here

What i am not understanding and at which point i need your help is the following: For calculation the contribute of each channel to the buy-probability (= C(xi); and this is what is important, because every marketer wants to know, which online marketing channel/campaign "works" and converts a internet-user to a buyer of the product) the following equation is used:

enter image description here

What does mean "The contribution of channel i is then computed at each positive user level" and what does mean "for a particular user"?
With equation 1 and 2 we have calculated terms on a aggregated level and suddenly, in equation 3, they talk about calulation on "User level".
What does this mean? And what is "N"? When i have 3 channels in total, is N = 2 (3 channels – 1 = 2 channels)? This "user level"-thing is irritating. I thougt i have to calculate for a specific channel – for example: channel "Display" – the terms according to equation 1 and 2 and just enter this terms in equation 3, and then i get C(display) = [myresult] But is it really so easy?

Best Answer

The idea of this model is that you "train" it in an aggregate manner, hence on a big chunk of data, and then you APPLY it on a user level to get your result. So first, you obtain the probabilities calculated for your training set (which can be the same as the users you're planning on applying the model on, but doesn't have to be). Then, each positive user is the same as each user who converted/generated revenue. The importance of a particular user comes is that the path/ad channels they observed could be different. N is therefore the numbers of channels the user was exposed to. After you re-distribute the revenue from each user across all your channels you sum that up to get the total value of each channel.

Related Question