Solved – Interpretation of Saturated Model vs. Model with Interaction and One Main Effect

interactionleast squaresregression

Say that I have two regressions:

1) $Y_i = \alpha_0 + \alpha_1 X_i + \alpha_2 X_i*Z_i + \epsilon_i$

2) $Y_i = \beta_0 + \beta_1 X_i + \beta_2 Z_i + \beta_3 X_i*Z_i + \epsilon_i$

$X_i$ and $Z_i$ are both binary variables. $Y_i$ is continuous.

How do the interpretations of the estimated coefficients differ between 1) and 2)? Specifically, how should I interpret $\alpha_2$ and $\beta_3$?

Best Answer

Throughout my answer, the usual conditional mean independence $\mathbb{E}(\varepsilon_{i}\vert X_{i},Z_{i})=0$ is maintained.

It is instructive to consider a concrete example. Let $X_{i}$ be a dummy of college education, such that $X_{i}=1$ if worker $i$ is a college graduate, and $X_{i}=0$ otherwise; and let $Z_{i}$ be a dummy of gender, such that $Z_{i}=1$ if $i$ is male, and $0$ if $i$ is female. And suppose $Y_{i}$ is the observed income. Hence $\mathbb{E}(Y_{i}\vert X_{i}=1,Z_{i}=1)$ is the expected income of a male college graduate, and $\mathbb{E}(Y_{i}\vert X_{i}=1,Z_{i}=0)$ is the expected income of a female college graduate. Other conditional expectations, such as $\mathbb{E}(Y_{i}\vert X_{i}=0,Z_{i}=0$), have similar interpretations.

First, it is not hard to verify that the coefficients $\alpha_{2}$ equals to $$ \alpha_{2}=\mathbb{E}(Y_{i}\vert X_{i}=1,Z_{i}=1)-\mathbb{E}(Y_{i}\vert X_{i}=1,Z_{i}=0). $$ This is the difference of the expected income of male and female college graduates. The significance of $\alpha_{2}$ may indicate gender discrimination among college graduates.

Next, we have $$ \beta_{2}+\beta_{3}=\alpha_{2}=\mathbb{E}(Y_{i}\vert X_{i}=1,Z_{i}=1)-\mathbb{E}(Y_{i}\vert X_{i}=1,Z_{i}=0). $$ And $$ \beta_{0}+\beta_{2}=\mathbb{E}(Y_{i}\vert X_{i}=0,Z_{i}=1),\ \beta_{0}=\mathbb{E}(Y_{i}\vert X_{i}=0,Z_{i}=0). $$ So $$ \beta_{2}=\mathbb{E}(Y_{i}\vert X_{i}=0,Z_{i}=1)-\mathbb{E}(Y_{i}\vert X_{i}=0,Z_{i}=0), $$ which measures the gender discrimination among workers without college degrees. And $\beta_{3}=(\beta_{2}+\beta_{3})-\beta_{2}$, that is $$ \beta_{3}=\{\mathbb{E}(Y_{i}\vert X_{i}=1,Z_{i}=1)-\mathbb{E}(Y_{i}\vert X_{i}=1,Z_{i}=0)\}-\{\mathbb{E}(Y_{i}\vert X_{i}=0,Z_{i}=1)-\mathbb{E}(Y_{i}\vert X_{i}=0,Z_{i}=0)\}. $$ So $\beta_{3}$ can be understood the difference of the magnitudes of gender discrimination in two cohorts, workers with college education and workers without college degree. The positive sign of $\beta_{3}$ indicates that the gender discrimation among higher educated workers is greater than it is in less educated workers.

Last but not least, one important assumption made implicitly by model (1) is the following $$ \mathbb{E}(Y_{i}\vert X_{i}=0,Z_{i}=0)=\mathbb{E}(Y_{i}\vert X_{i}=0,Z_{i}=1)=\mathbb{E}(Y_{i}\vert X_{i}=0)=\alpha_{0}. $$ That is by specifying model (1), one has assumed that there is no wage discrimination against gender for those who have no college degree. The expectation $\mathbb{E}(Y_{i}\vert X_{i}=0,Z_{i}=0)$ and $\mathbb{E}(Y_{i}\vert X_{i}=0,Z_{i}=1)$ are the expected income of female and male worker without college education, respectively. Such an assumption in general may or may not hold, depending on your empirical exercise.

Related Question