Solved – Predictive Model for Attribution Model

machine learningmarketingpredictive-models

There is a video on youtube for a talk on use of GBM for building a marketing attribution model at netflix. Essentially, this is a binary classification problem where a visitor to the site either converts or does not (1 or 0) and there are assorted information about the visitor (e.g. what other ads they clicked on and how long ago, if they got an email from the company etc). I believe the goal is to use the model to say of all the conversions we got, how many should be attributed to different things that happened to the customer. So, the company can determine if sending an email really matters for example or how important was Ad 123 for conversion.

My question is how would the fitted model be used (completely skipped in the video).

Here is one way I was thinking about – does it make sense or are there better ones?

Define 'touch-points' as channels, ad types etc => whatever you want to attribute conversions to which have a cost (other things would be controls in the model to adjust for other circumstances like day of the week).

Run the data through the fitted model and observe the mean probability of conversion given the observed predictors over all visitors: P(conversion=1 | X). This is the baseline.
For each touch point variable, x_i set the value to zero (and all interactions to zero for example that involve it) for all visitors and run this data through the model. Again, observe the mean probability of conversion: P(conversion=1 | x; x_i=0).
Calculate the net effect for touch-point i ==> net_effect_i = P(conversion=1 | X) – P(conversion=1 | x; x_i=0)
Once done for all touch-points, normalize the net effects as follows==> norm_net_effect_i = net_effect_i/sum(net_effect_i for all i).
Then multiply the normalized net effects and the total true conversions to attribute the number of conversions to each touch-point.

Best Answer

I think they way you are thinking about it is correct, but I would add a little bit more related to how I think the company would use this information.

1.) Maximize your return on investment. You eluded to this concept in your question, but you missed a key concept: The cost of the variable. Let's consider a simple model of two advertising strategies, A and B, which cost 100\$ and 500\$ respectively. If you acquire 30 costumers for A and 60 customers from B who will pay a 10\$ subscription fee, strategy A would yield a return of 200\$ while A would yield 100\$. Thus, it would make more sense to shift resources to A from B, even though B get's you more customers.

2.) Better coordination between services. In point 1, we assumed uncorrelated events. However, it could very well be that the customer bought the product because A occurred before B and not when B occurred before A. Thus, you would also want to account for this in your return on investment. Finding correlations between marketing strategies could yield some very interesting and non-trivial ways to increase the chances of acquiring new customers.

3.) Gaining a better understanding of your customers psychology to create new marketing techniques.. These models are not just numbers, they are representations of complex thought processes across a wide range of individuals. To understand why your customer purchased your product may be the biggest gain from this model.

I'm not really sure if this constitutes an answer, but this is how I would leverage the data if I were in their position. I'm sure there are many other ways as well.

Related Solutions

Solved – Kappa for Predictive Model

It might be useful to consider Cohen's $\kappa$ in the context of inter-rater-agreement. Suppose you have two raters individually assigning the same set of objects to the same categories. You can then ask for overall agreement by dividing the sum of the diagonal of the confusion matrix by the total sum. But this does not take into account that the two raters will also, to some extent, agree by chance. $\kappa$ is supposed to be a chance-corrected measure conditional on the baseline frequencies with which the raters use the categories (marginal sums).

The expected frequency of each cell under the assumption of independence given the marginal sums is then calculated just like in the $\chi^2$ test - this is equivalent to Witten & Frank's description (see mbq's answer). For chance-agreement, we only need the diagonal cells. In R

# generate the given data
> lvls <- factor(1:3, labels=letters[1:3])
> rtr1 <- rep(lvls, c(100, 60, 40))
> rtr2 <- rep(rep(lvls, nlevels(lvls)), c(88,10,2, 14,40,6, 18,10,12))
> cTab <- table(rtr1, rtr2)
> addmargins(cTab)
     rtr2
rtr1    a   b   c Sum
  a    88  10   2 100
  b    14  40   6  60
  c    18  10  12  40
  Sum 120  60  20 200

> library(irr)       # for kappa2()
> kappa2(cbind(rtr1, rtr2))
 Cohen's Kappa for 2 Raters (Weights: unweighted)
 Subjects = 200 
   Raters = 2 
    Kappa = 0.492 
        z = 9.46 
  p-value = 0 

# observed frequency of agreement (diagonal cells)
> fObs <- sum(diag(cTab)) / sum(cTab)

# frequency of agreement expected by chance (like chi^2)
> fExp <- sum(rowSums(cTab) * colSums(cTab)) / sum(cTab)^2
> (fObs-fExp) / (1-fExp)    # Cohen's kappa
[1] 0.4915254

Note that $\kappa$ is not universally accepted at doing a good job, see, e.g., here, or here, or the literature cited in the Wikipedia article.

Survival Model – Using Time-Varying Predictors for Predicting Churn

Thank you for the clarification, B_Miner. I don't do a lot of forecasting myself, so take what follows with a pinch of salt. Here is what I would do as at least a first cut at the data.

First, formulate and estimate a model that explains your TVCs. Do all of the cross-validation, error checking, etc., to make sure you have a decent model for the data.
Second, formulate and estimate a survival model (of whatever flavor). Do all of the cross-validation, error checking, to make sure this model is reasonable as well.
Third, settle on a method of using the forecasts from the TVCs model as the basis of forecasting risks of churn and whatever else you want. Once again, verify that the predictions are reasonable using your sample.

Once you have a model that you think is reasonable, I would suggest bootstrapping the data as a way to incorporate the error in the first TVC model into the second model. Basically, apply steps 1-3 N times, each time taking a bootstrap sample from the data and producing a set of forecasts. When you have a reasonable number of forecasts, summarize them in any way you think is appropriate for your task; e.g., provide mean risk of churn for each individual or covariate profile of interest as well as 95% confidence intervals.

Best Answer

Related Solutions

Solved – Kappa for Predictive Model

Survival Model – Using Time-Varying Predictors for Predicting Churn

Related Question