Regression – Interpretation of Insignificant Interaction Term

econometricsinteractioninterpretationregression

I know there are other posts very similar to this, but I think I am asking a slightly different question.

Say my model is:

$Income= \beta_0 + treatment\,\beta_1 + Z\,\beta_2 + Z*treatment\,\beta_3 + \epsilon$

Where I want to know the effect of treatment on income, and suspect it may vary with some other variable $Z$. Define the short regression as:

$Income =\alpha_0 + treatment\,\alpha_1 +\eta$

Now say in the short regression, $\hat{\alpha_1}= 10$. and in the interaction model, say $\beta_2$ is small and statistically insignificant, but $\hat{\beta_1} \approx \hat{\alpha_1}$. and both are significant.

Since the for the first regression, $\beta_1$ is:

$\frac{\partial E[income|treatment,z=0]}{\partial treatment}$, is this then saying that the effect of treatment only exists for subsets where z = 0? or are these just saying the effect of treatment does not vary with Z? i.e. can I say since the coefficients on treatment across models are about equal, this suggests z does not influence the effect of treatment on income?

does it make sense to argue from the first model alone, that we cant reject $\beta_1 = \beta_1 + Z\beta_3$ for any value of z, that this means treatment does not vary with z, vs interpreting this as treatment only applies when z = 0?

Best Answer

The "long" model is: $$E[y \vert t,z]=\beta_0 + \beta_1 \cdot t + \beta_2 \cdot z + \beta_3 \cdot z \times t$$

Here, the marginal effect of treatment is a function of $z$:

$$ME_t=\frac{\partial E[y \vert t,z]}{\partial t}=\beta_1 + \beta_3 \cdot z=f(z)$$

You can ask how does the $ME_t$ vary with $z$, which you can get by

$$\frac{\partial MT_t}{\partial z}=\frac{\partial E[y \vert t,z]}{\partial z \partial t}=\beta_3$$

The size, sign, and significance of $\beta_3$ tell you whether there is a substantive heterogeneous treatment effect that depends on $z$. The coefficient of $\beta_2$ is not the one you care about.

You are right that $\beta_1$ gives the expected effect for someone with a $z$ of zero. This is usually not a very relevant number unless $z$ has been rescaled or zero is a typical value. But this effect exists for everyone, not just for those with $z$ at zero. If $z \ne 0$, there is an additional effect on top of the direct one.

The "short" model is: $$E[y \vert t,z]=\tilde \beta_0 + \tilde\beta_1 \cdot t + 0 \cdot z + 0 \cdot z \times t = \alpha_0 + \alpha_1 \cdot t$$

There are two things different here. One is that you dropped $z$ from the model. That's probably inconsequential, assuming treatment does not depend on $z$. The second is that you've imposed that the interaction is zero. This has more bite.

The most comparable number to $\alpha_1$ is the average marginal effect from the "long" model:

$$AME_t = \frac{1}{n}\sum_i^n \left( \beta_1 + \beta_3 \cdot z_i \right)= \beta_1 + \beta_3 \cdot \frac{1}{n}\sum_i^n z_i=\beta_1 + \beta_3 \cdot \bar z$$

In the linear case, you can think of this as the effect of treatment for someone with the sample average $z$.

In the "short" model, the $ME_t=AME_t$, since the effect doesn't depend on $z$ and is the same for everyone.

I don't think you need the short model on top of the long one to decide if there is heterogeneity and comparing them is not all that useful. The long model already gives you everything you need.

I can answer your second question with a counterexample. Here I have simulated an effect that is positive for high values of $z$ and negative for low values, but the net effect is that these cancel and make the treatment look ineffective on average. Let's simulate the data and fit the "long" model first:

. clear

. set obs 1000
Number of observations (_N) was 0, now 1,000.

. gen t = mod(_n,2)

. gen z = mod(_n,3) - 1

. gen y = 3 + 0*t + 1*z + 5*t*z + rnormal(0,2)

. reg y i.t##c.z

      Source |       SS           df       MS      Number of obs   =     1,000
-------------+----------------------------------   F(3, 996)       =   1012.18
       Model |  12206.1146         3  4068.70487   Prob > F        =    0.0000
    Residual |  4003.68069       996  4.01975973   R-squared       =    0.7530
-------------+----------------------------------   Adj R-squared   =    0.7523
       Total |  16209.7953       999  16.2260213   Root MSE        =    2.0049

------------------------------------------------------------------------------
           y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         1.t |   .1225485   .1268035     0.97   0.334    -.1262842    .3713813
           z |   1.076669     .10987     9.80   0.000     .8610661    1.292273
             |
       t#c.z |
          1  |   4.880441   .1553797    31.41   0.000     4.575532     5.18535
             |
       _cons |   2.894459   .0896636    32.28   0.000     2.718508    3.070411
------------------------------------------------------------------------------

This is the average marginal effect:

. margins, dydx(t)

Average marginal effects                                 Number of obs = 1,000
Model VCE: OLS

Expression: Linear prediction, predict()
dy/dx wrt:  1.t

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         1.t |   .1225485   .1268035     0.97   0.334    -.1262842    .3713813
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

Here we cannot reject the null that the overall effect is nil.

Here $z$ has three possible values:

. tab z

          z |      Freq.     Percent        Cum.
------------+-----------------------------------
         -1 |        333       33.30       33.30
          0 |        334       33.40       66.70
          1 |        333       33.30      100.00
------------+-----------------------------------
      Total |      1,000      100.00

We can calculate the effect at each one of them:

. margins, dydx(t) at(z = (-1 0 1)) 

Conditional marginal effects                             Number of obs = 1,000
Model VCE: OLS

Expression: Linear prediction, predict()
dy/dx wrt:  1.t
1._at: z = -1
2._at: z =  0
3._at: z =  1

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
0.t          |  (base outcome)
-------------+----------------------------------------------------------------
1.t          |
         _at |
          1  |  -4.757893   .2005542   -23.72   0.000     -5.15145   -4.364335
          2  |   .1225485   .1268035     0.97   0.334    -.1262842    .3713813
          3  |    5.00299   .2005542    24.95   0.000     4.609433    5.396547
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

The effects are significant at the extremes but of opposite sign, so the treatment looks ineffective on average.

To sum up

  1. the interaction coefficient is the canary in the coalmine of effect heterogeneity.
  2. If you add interaction coefficients, don't just average them away. Plotting the effects can help
  3. The long model is all you need.

Stata Code

cls
clear
set obs 1000
gen t = mod(_n,2)
gen z = mod(_n,3) - 1
gen y = 3 + 0*t + 1*z + 5*t*z + rnormal(0,2)
reg y i.t##c.z
margins, dydx(t)
tab z
margins, dydx(t) at(z = (-1 0 1))
Related Question