Solved – the difference between a confounder, collinearity, and interaction term

confoundinginteractionmulticollinearitymultiple regressionregression

These terms kind of confuse me because they all seem to imply a certain correlation.

Confounder: influences dependent and independent variable

Collinearity: to me just means correlation between indepedent variables

Interaction term: joint effect of independent variables (but doesn't this require correlation between those variables?)

Best Answer

Your understanding of confounding and collinearity is correct. Note that in many contexts collinearity really refers to "perfect collinearity" where one variable is a linear combination of one or more other variables, but in some contexts it just refers to "high correlation" between variables.

Of course, in order for confounding to occur, there has to be a degree of correlation, though I would avoid saying "collinearity" due to the above.

However:

interaction term: joint effect of independent variables (but doesn't this require correlation between those variables?)

A "joint effect" is a good way to undersdand it, but in no way does it require correlation between the variables. Consider an orthogonal factorial design experiment for example.

As another example we could also show this with a simple simulation of bivariate data where X1 and X2 are uncorrelated yet a meaningful interaction exists:

> set.seed(1)
> N <- 100
> X1 <- rnorm(N)
> X2 <- rnorm(N)
> cor(X1, X2)
[1] -0.0009943199   # X1 and X2 are uncorrelated
> 
> Y <- X1 * X2 + rnorm(N)
> lm(Y ~ X1 * X2) %>% summary()

Call:
lm(formula = Y ~ X1 * X2)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.92554 -0.43139  0.00249  0.65651  2.60188 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.03107    0.10439   0.298    0.767    
X1          -0.03352    0.12064  -0.278    0.782    
X2          -0.02822    0.10970  -0.257    0.798    
X1:X2        0.76032    0.14847   5.121 1.57e-06 ***
Related Question