I know there are other posts very similar to this, but I think I am asking a slightly different question.
Say my model is:
$Income= \beta_0 + treatment\,\beta_1 + Z\,\beta_2 + Z*treatment\,\beta_3 + \epsilon$
Where I want to know the effect of treatment on income, and suspect it may vary with some other variable $Z$. Define the short regression as:
$Income =\alpha_0 + treatment\,\alpha_1 +\eta$
Now say in the short regression, $\hat{\alpha_1}= 10$. and in the interaction model, say $\beta_2$ is small and statistically insignificant, but $\hat{\beta_1} \approx \hat{\alpha_1}$. and both are significant.
Since the for the first regression, $\beta_1$ is:
$\frac{\partial E[income|treatment,z=0]}{\partial treatment}$, is this then saying that the effect of treatment only exists for subsets where z = 0? or are these just saying the effect of treatment does not vary with Z? i.e. can I say since the coefficients on treatment across models are about equal, this suggests z does not influence the effect of treatment on income?
does it make sense to argue from the first model alone, that we cant reject $\beta_1 = \beta_1 + Z\beta_3$ for any value of z, that this means treatment does not vary with z, vs interpreting this as treatment only applies when z = 0?
Best Answer
The "long" model is: $$E[y \vert t,z]=\beta_0 + \beta_1 \cdot t + \beta_2 \cdot z + \beta_3 \cdot z \times t$$
Here, the marginal effect of treatment is a function of $z$:
$$ME_t=\frac{\partial E[y \vert t,z]}{\partial t}=\beta_1 + \beta_3 \cdot z=f(z)$$
You can ask how does the $ME_t$ vary with $z$, which you can get by
$$\frac{\partial MT_t}{\partial z}=\frac{\partial E[y \vert t,z]}{\partial z \partial t}=\beta_3$$
The size, sign, and significance of $\beta_3$ tell you whether there is a substantive heterogeneous treatment effect that depends on $z$. The coefficient of $\beta_2$ is not the one you care about.
You are right that $\beta_1$ gives the expected effect for someone with a $z$ of zero. This is usually not a very relevant number unless $z$ has been rescaled or zero is a typical value. But this effect exists for everyone, not just for those with $z$ at zero. If $z \ne 0$, there is an additional effect on top of the direct one.
The "short" model is: $$E[y \vert t,z]=\tilde \beta_0 + \tilde\beta_1 \cdot t + 0 \cdot z + 0 \cdot z \times t = \alpha_0 + \alpha_1 \cdot t$$
There are two things different here. One is that you dropped $z$ from the model. That's probably inconsequential, assuming treatment does not depend on $z$. The second is that you've imposed that the interaction is zero. This has more bite.
The most comparable number to $\alpha_1$ is the average marginal effect from the "long" model:
$$AME_t = \frac{1}{n}\sum_i^n \left( \beta_1 + \beta_3 \cdot z_i \right)= \beta_1 + \beta_3 \cdot \frac{1}{n}\sum_i^n z_i=\beta_1 + \beta_3 \cdot \bar z$$
In the linear case, you can think of this as the effect of treatment for someone with the sample average $z$.
In the "short" model, the $ME_t=AME_t$, since the effect doesn't depend on $z$ and is the same for everyone.
I don't think you need the short model on top of the long one to decide if there is heterogeneity and comparing them is not all that useful. The long model already gives you everything you need.
I can answer your second question with a counterexample. Here I have simulated an effect that is positive for high values of $z$ and negative for low values, but the net effect is that these cancel and make the treatment look ineffective on average. Let's simulate the data and fit the "long" model first:
This is the average marginal effect:
Here we cannot reject the null that the overall effect is nil.
Here $z$ has three possible values:
We can calculate the effect at each one of them:
The effects are significant at the extremes but of opposite sign, so the treatment looks ineffective on average.
To sum up
Stata Code