Solved – Understanding mean independence in the regression setting

consistencyeconometricsidentifiabilitylinear modelregression

The notion of uncorrelated ($\mathbb{E}[XY]=0$) and mean independence ($\mathbb{E}[X|Y]=0$) are mentioned in different setting of regression assumptions. We know that $\mathbb{E}[X|Y]=0$ implies $\mathbb{E}[XY]=0$ (but not the other way round). Here is a specific question about the relationship between these two notions in the regression setting.

We are looking at the effect of whether go to school or not on the wage of a population. Let $D_i\in\{1,0\}$ be the random variable denote whether individual $i$ went to school ($D_i=1$) or not ($D_i=0$). Let $Y_i$ be the wage of people $i$. Note that if we can FORCE everyone in the population go to school, then we will have a wage distribution denoted by $Y_{1i}$ and similarly, if we FORCE all people not go to school, we have a wage distribution denoted by $Y_{0i}$.

So we have $Y_i = D_iY_{1i} + (1-D_i)Y_{0i}~~~~~~~~~~~~~~~(1)$.

Note that we can always write $Y_{1i} =\mu_1+\epsilon_{1i}$ and $Y_{0i} =\mu_0+\epsilon_{0i}$, i.e., mean plus a noise with mean 0. Then, we substitute these 2 equations into equation (1), we have

$Y_i=\mu_0+(\mu_1-\mu_0)D_i+\epsilon_i~~~~~~~(2)$
where $\epsilon_i=\epsilon_{0i}+D_i(\epsilon_{1i}-\epsilon_{0i})$
Note that $\epsilon_i$ has 0 mean clearly.

So equation (2) describes the real world about wage and school without making any assumptions other than the mean of $Y_{1i}$ and $Y_{0i}$ is finite.

Note that $\epsilon_i$ will always dependent with $D_i$ (but they are not necessarily correlated). Now suppose $\epsilon_i$ and $D_i$ are uncorrelated (first, I don't know what does this mean in practice), then we know that OLS estimator is consistent (for unbiasedness of OLS, it would require mean independence, i.e., $\mathbb{E}[\epsilon_i|D_i]=0$). So $\mu_0$ and $\mu_1$ is identifiable. In this case, $\epsilon_i$ and $D_i$ being uncorrelated is equivalent to $\mathbb{E}[\epsilon_i D_i]=0$. I wonder if someone could explain the underlying meaning of this expression in this setting.

Note that a sufficient condition for $\mathbb{E}[\epsilon_i D_i]=0$ is that $\mathbb{E}[\epsilon_i|D_i]=0$. I can understand this expression very well, which is "given the information of $D_i$ is not going to change the mean of the random variable $\epsilon_i$". Note that this is weaker than the notion of independence, since $\epsilon_i$ independent of $D_i$ means that given the information of $D_i$, the distribution of $\epsilon_i$ remains the same, which is much stronger than the first moment remains the same (i.e., $\mathbb{E}[\epsilon_i|D_i]=0$).

The expression $\mathbb{E}[\epsilon_i|D_i]=0$ can be explained intuitively if we look at this identification problem from a different angle, we have:

$E[Y_i|D_i=1]-E[Y_i|D_i=0]=(\mu_1-\mu_0)+E[\epsilon_i|D_i=1]-E[\epsilon_i|D_i=0]=(\mu_1-\mu_0)+E[\epsilon_{1i}|D_i=1]-E[\epsilon_{0i}|D_i=0]$.

Note that we observe $E[Y_i|D_i=1]$ and $E[Y_i|D_i=0]$ and we want to identify $\mu_1-\mu_0$, which requires $E[\epsilon_{1i}|D_i=1]-E[\epsilon_{0i}|D_i=0]=0$. Note that if randomly assign school or not school to people in the population, this will guarantee $E[\epsilon_{1i}|D_i=1]-E[\epsilon_{0i}|D_i=0]=0$ (or even if we don't have randomized assignment, but somehow, we know that $\mathbb{E}[\epsilon_i|D_i]=0$, then we are still able to make this claim).

However, if we only have $\epsilon_i$ and $D_i$ are uncorrelated, i.e., $E[\epsilon_i D_i]=0$, this will not imply $E[\epsilon_{1i}|D_i=1]-E[\epsilon_{0i}|D_i=0]=0$. But then it implies that by purely look at the group mean (i.e., $E[Y_i|D_i=1]$ and $E[Y_i|D_i=0]$) will not help us identify $\mu_1-\mu_0$, but run OLS will achieve this goal. Where is my logic going wrong?

Best Answer

The assumption here that $\epsilon_i$ and $D_i$ are uncorrelated without mean independence holding is impossible when $D_i$ takes only two values. Intuitively, correlation measures the linear relationship between the values, so for mean independence to not hold in the presence of zero correlation, the mean $\mathbb{E}[\epsilon_i \mid D_i]$ should be a nonlinear function of $D_i$. But with only two possible values for $D_i$, there is no room for nonlinearity.

Proof

Let us assume $\mathbb{E}[\epsilon_i]=0,~\mathbb{E}[\epsilon_i\,D_i]=0$ and denote the two possible values of $D_i$ by $d_1$ and $d_2$. Using the two assumptions and decomposing over $D_i=d_1,D_i=d_2$, we get \begin{equation} \begin{cases} \mathbb{P}(D_i=d_1)\,\mathbb{E}(\epsilon_i \mid D_i = d_1) + \mathbb{P}(D_i=d_2)\,\mathbb{E}(\epsilon_i \mid D_i = d_2) = 0 \\ \mathbb{P}(D_i=d_1)\,\mathbb{E}(\epsilon_i \mid D_i = d_1)\,d_1 + \mathbb{P}(D_i=d_2)\,\mathbb{E}(\epsilon_i \mid D_i = d_2)\,d_2 = 0 \end{cases} \end{equation}

By solving this system of equations for $\mathbb{P}(D_i=d_1)\,\mathbb{E}(\epsilon_i \mid D_i = d_1)$ and $\mathbb{P}(D_i=d_2)\,\mathbb{E}(\epsilon_i \mid D_i = d_2)$, we see that either

  1. $d_1=d_2$ or
  2. $\mathbb{P}(D_i=d_1)\,\mathbb{E}(\epsilon_i \mid D_i = d_1) = \mathbb{P}(D_i=d_2)\,\mathbb{E}(\epsilon_i \mid D_i = d_2)=0$

The first case would mean $D_i$ has only one possible value (and mean independence would trivially hold). Assuming both probabilities $\mathbb{P}(D_i=d_k)>0$*, the second case then implies $\mathbb{E}(\epsilon_i \mid D_i = d_{k} )=0$, that is, mean independence. Thus, mean independence follows from the assumptions.

*If one of the probabilities is $0$, the corresponding $\mathbb{E}(\epsilon_i \mid D_i = d_k)$ can technically obtain any value, but then the model would correspond to $D_i$ having only one possible values.