Solved – Testing for useful variables in a “net lift model”

data mininglogisticpredictive-models

I am often involved in modeling the Net lift, aka Uplift, aka incremental response of direct marketing campaigns. In a nutshell, this approach looks to model and thus select for marketing those individuals who require promotion in order to take the desired action (e.g. order a product). Given prospect A and B, if A will buy if we send a direct mail letter and B will buy anyway, we target A. Think of "swing voters" in political elections – you put resources against the persuadable undecided.

I am looking for affirmation or suggestions on the following approach to help understand which variables are important. Here is an example of a binary predictor variable. Lets say that 20,000 direct mail letters were sent (10,000 to those with predictor1 =0 and 10,000 with predictor1 =1) and 2,000 control customers were selected and held out to not receive the letter. In this case, the difference in response rate (treated group – control) is 1.5% for predictor1 =0 and -0.3% for predictor1=1. We would conclude that those with predictor1=0 are the better candidates to send a letter to – as the letter actually decreased the point estimate of response for those with a value of 1 for predictor1.

The difference in the odds ratio is 1.413 (treated – control).

enter image description here

Here is an example where predictor 1 is not useful.

enter image description here

Is it accurate to test the value of a given predictor on this difference in response rates (for any type of predictor, nit just a simple binary one) using a standard logistic regression? For example,the first scenario shown above would result in a significant interaction variable of treated*predictor1:

dat<-data.frame(treated=c(1,1,0,0),predictor=c(0,1,0,1), responded=c(250,122,10,15), notresponded = c(9750,9879,990,985))
mod<-glm(cbind(responded,notresponded)~predictor*treated, data=dat,family=binomial)
summary(mod) 

Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)        -4.5951     0.3178 -14.458  < 2e-16 ***
predictor           0.4105     0.4107   1.000  0.31754    
treated             0.9316     0.3242   2.873  0.00406 ** 
predictor:treated  -1.1411     0.4255  -2.682  0.00733 ** 

While the second scenario does not:

dat<-data.frame(treated=c(1,1,0,0),predictor=c(0,1,0,1), responded=c(130,122,10,15), notresponded = c(9870,9879,990,985))
mod<-glm(cbind(responded,notresponded)~predictor*treated, data=dat,family=binomial)
summary(mod)


Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)        -4.5951     0.3178 -14.458   <2e-16 ***
predictor           0.4105     0.4107   1.000    0.318    
treated             0.2654     0.3299   0.805    0.421    
predictor:treated  -0.4750     0.4299  -1.105    0.269  

Question:

Issues with multiple testing aside, does this not represent a viable method for testing variables for net lift modeling?

Should I exclude the main effects and only have the interaction term? This is a principle question and speaks to the interpretation of an interaction – given that I am looking for "significant" variables which could be used to select customers to market to in order to maximize the difference between treated and control.

Can one also use this framework to test for deeper interactions (treated*predictor1*predictor2) to see if we need to really look at predictor1 and predictor2 combinations?

Best Answer

I don't think it is a good idea to drop the main effects given that the uplift models try to model the second order incremental effect on the top of the main effects (the response variable is not incremental since a particular subject cannot be in both test and in control in the same time.)

The True Lift Model - A Novel Data Mining Approach to Response Modeling in Database Marketing by Victor Lo includes a simple example of using logistic regression - it includes both the main effect and the interaction terms.

The step-wise variable selection that you are implying is possible but I believe there are better methods based on regularization (elastic net etc.) As for including higher order interactions, why not? - they are just additional variables.

Real-World Uplift Modelling with Significance-Based Uplift Trees by Radcliffe and Surry states that variable selection is very important so I guess one should be careful with using higher order interactions.

If I were to try this I would first build a good quality main effect model (based on the control dataset only by utilizing regularization, interactions etc.), then I would throw in the test/control interaction term and re-fit the model on the full training dataset (using the same tuning hyper-parameters)