Solved – Interpreting interaction term on highly correlated variables

interactioninterpretationmarketingmulticollinearityregression

Somebody at a meeting today made the following comment about a Marketing Mix Model (Linear Regression) we run every year.

We should account for the high collinearity of the two Marketing Variables (two Marketing variables we include in our model)
We should always include the interaction of these two Marketing variables in the model, to better interpret the synergistic effect

I feel there's something wrong about these two statements being put together.
Let me simplify the problem:

Let's denote $X_1$ and $X_2$ as two Marketing variables, and we want to estimate their contribution to total sales $Y$.
We then run a linear model $$Y = \beta_0 + \beta_1X_1 + \beta_2X_2$$
We find that $X_1$ and $X_2$ are highly correlated between one-another. This should, in theory, impact the coefficients and the individual t-statistics due to collinearity.

Now if we do the model such as $$ Y = \beta_0 + \beta_1X_1 + \beta_2X_2 + \beta_3X_1 X_2 $$
In the extreme case of perfect collinearity I would assume that the interaction term $X_1 X_2$ it's the same as $X_1^2$ or quadratic relationship to dependent variable $Y$. So there could be situations where the coefficients for $X_1$ and $X_2$ individually are not significant but $X_1 X_2$ is significant (where significant means a low t-statistic). Please correct me if this is not true.

Now my question is. Does this last model really make sense at all, from an interpretation standpoint?

Even without perfect collinearity issues I don't know if we can infer that the interaction is really measuring the linear combination of both variables.
Meaning when I execute both marketing tactics together I'm not getting the simply the synergistic effect. Instead, I'm actually capturing the quadratic effect of just one of the variables. In this sense I can't really interpret the interaction term.
Also I'm not quite sure how $X_1 X_2$ is providing new information to the model given that their variance explained should be near identical.

Best Answer

I agree that in the case of perfect collinearity the interaction is just the square and it is possible to main effects that are not significant but a significant interaction.

If you had perfect collinearity then one approach is to add some small random error to one of the variables, or you could combine them, if this makes sense in your context.

Even without perfect collinearity issues I don't know if we can infer that the interaction is really measuring the linear combination of both variables.

It is, that's exactly what it does.

Related Solutions

Solved – Dealing with multicollinearity when removing a highly collinear predictor reduces significance

Correlation between two independent variables is not necessarily a sign of troublesome colinearity. The guru of colinearity, David Belsley, has shown this in his books: Conditioning Diagnostics: Colinearity and Weak Data in Regression and Regression Diagnostics: Identifying Influential Points and Sources of Collinearity.

In the comments, @Whuber points out that colinearity is not always a problem that has to be dealt with and that your maximum condition index indicates that, here, it is not even a problem at all.

At the other end, it is also possible to have very high colinearity without any high correlations. One example of this is if there are 10 IVs, 9 of which are independent and the 10th is the sum of the other 9.

In addition to condition indexes (and developed after Belsley's books were written) in R there is the perturbpackage that examines the problem of colinearity by adding small amounts of random noise to the input data and seeing what happens; one of the problems that colinearity can cause is that small changes to the input data can lead to huge changes in the regression results. In one of Belsley's books he gives an example where changing the data in the third or fourth significant digit reverses the signs of regression coefficients.

Solved – In linear regression, why should we include quadratic terms when we are only interested in interaction terms

It depends on the goal of inference. If you want to make inference of whether there exists an interaction, for instance, in a causal context (or, more generally, if you want to interpret the interaction coefficient), this recommendation from your professor does make sense, and it comes from the fact that misspecification of the functional form can lead to wrong inferences about interaction.

Here is a simple example where there is no interaction term between $x_1$ and $x_2$ in the structural equation of $y$, yet, if you do not include the quadratic term of $x_1$, you would wrongly conclude that $x_1$ interacts with $x_2$ when in fact it doesn't.

    set.seed(10)
    n <- 1e3
    x1 <- rnorm(n)
    x2 <- x1 + rnorm(n)
    y <- x1 + x2 + x1^2 + rnorm(n)
    summary(lm(y ~ x1 + x2 + x1:x2))
    
    Call:
    lm(formula = y ~ x1 + x2 + x1:x2)
    
    Residuals:
        Min      1Q  Median      3Q     Max 
    -3.7781 -0.8326 -0.0806  0.7598  7.7929 
    
    Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
    (Intercept)  0.30116    0.04813   6.257 5.81e-10 ***
    x1           1.03142    0.05888  17.519  < 2e-16 ***
    x2           1.01806    0.03971  25.638  < 2e-16 ***
    x1:x2        0.63939    0.02390  26.757  < 2e-16 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    Residual standard error: 1.308 on 996 degrees of freedom
    Multiple R-squared:  0.7935,    Adjusted R-squared:  0.7929 
    F-statistic:  1276 on 3 and 996 DF,  p-value: < 2.2e-16

This can be interpreted as simply a case of omitted variable bias, and here $x_1^2$ is the omitted variable. If you go back and include the squared term in your regression, the apparent interaction disappears.

    summary(lm(y ~ x1 + x2 + x1:x2 + I(x1^2)))   
     
    Call:
    lm(formula = y ~ x1 + x2 + x1:x2 + I(x1^2))
    
    Residuals:
        Min      1Q  Median      3Q     Max 
    -3.4574 -0.7073  0.0228  0.6723  3.7135 
    
    Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
    (Intercept) -0.0419958  0.0398423  -1.054    0.292    
    x1           1.0296642  0.0458586  22.453   <2e-16 ***
    x2           1.0017625  0.0309367  32.381   <2e-16 ***
    I(x1^2)      1.0196002  0.0400940  25.430   <2e-16 ***
    x1:x2       -0.0006889  0.0313045  -0.022    0.982    
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    Residual standard error: 1.019 on 995 degrees of freedom
    Multiple R-squared:  0.8748,    Adjusted R-squared:  0.8743 
    F-statistic:  1739 on 4 and 995 DF,  p-value: < 2.2e-16

Of course, this reasoning applies not only to quadratic terms, but misspecification of the functional form in general. The goal here is to model the conditional expectation function appropriately to assess interaction. If you are limiting yourself to modeling with linear regression, then you will need to include these nonlinear terms manually. But an alternative is to use more flexible regression modeling, such as kernel ridge regression for instance.

Best Answer

Related Solutions

Solved – Dealing with multicollinearity when removing a highly collinear predictor reduces significance

Solved – In linear regression, why should we include quadratic terms when we are only interested in interaction terms

Related Question