Solved – Interaction effect in a multiple regression vs split sample

categorical datainteractionmultiple regression

Until now I thought I understand an interaction effect. I interpreted it always as the change of slope conditional on some dummy=1. Perhaps I am wrong.

I have a model and add an interaction dummy D, ($D=1$ for European countries, $0$ for rest).

$y=\alpha + D\beta_1 +x\beta_2 + xD\beta_3 +\nu $

I ran the regression using just the observations where $D=1$:

if $D=1$ regress $y=\alpha + x\gamma +\xi $

Unsurprisingly, I find that $\gamma=\beta_2+\beta_3=-0.25$

The confusion comes however with a multivariate regression with more than one covariate.

My model is now:

$y=\alpha + D\beta_1 +x\beta_2 + xD\beta_3 + \Omega\delta +\nu $

with $\Omega$ being a vector of covariates.

Now when I rerun my model only with the observations where $D=1$.

if $D=1$ regress $y=\alpha + x\gamma + \Omega\delta +\xi $

I find to my big surprise that $\gamma \ne \beta_2+\beta_3$

In fact, $\gamma=-0.6$ and highly significant while $\beta_2+\beta_3=0.4$ and insignificant.

What is going on?

Best Answer

What you have to realize it that a split sample is different from a interaction effect.

Interaction effect with x concerns only a change of the slope of that particular independent variable x, leaving all other slopes constant.

Splitting the sample is equivalent to having an interaction dummy for every independent variable. In other words, you allow to to have a change in the slope for every independent variable.

So essentially,

$y=\alpha + D+ \beta_1 D x_1+ \beta_2D x_2+...+\beta_n Dx_n +\varepsilon \equiv if D=1: y=(\alpha + D)+\beta_1x_1+ \beta_2x_2+...+\beta_nx_n + \varepsilon $

If use use just one interaction dummy on one regressand, you use different assumptions that result in a totally different model.