Solved – Difference-in-difference and omitted variable bias

biascausalitydifference-in-difference

I have a question concerning the difference-in-differences research design:

if i can find a variable which is both correlated with the difference-in-difference estimator and the dependent variable of the regression equation, is then the difference-in-difference estimate biased?

Best Answer

It is best to separate the estimation problem from identification of the parameter of interest. When we use diff-in-diff, we want to estimate an average effect. The first step is to show that this average effect is identified (that is, calculable from data that we observe). The second is to construct an estimator that estimates the average effect without bias. I present an identification argument that spells out the assumptions needed for diff-in-diff to be unbiased.

Take causal inference in one time period. The goal is to identify the average treatment effect (ATE) or the average treatment effect on the treated (ATT). In the potential outcomes notation, $T_i \in \{0, 1\}$ is unit $i$'s treatment assignment, $Y_i(1)$ is unit $i$'s outcome under treatment, $Y_i(0)$ is its outcome without treatment, and $Y_i(T_i)$ is its observed outcome. The treatment effect on unit $i$ is $Y_i(1) - Y_i(0) \equiv \delta_i$. The ATE is $\mathbb{E}(\delta_i)$ and the ATT is $\mathbb{E}(\delta_i \mid T_i = 1)$.

Both the ATE and the ATT as parameters of interest are only identified if $\delta_i \perp T_i$. That is, unit $i$'s treatment effect needs to be independent of its treatment assignment. Under this assumption, the ATE is also equal to the ATT.

Diff-in-diff relaxes this assumption by working in a two-period setting. With two periods, the potential outcomes are $Y_{it}(1)$ and $Y_{it}(0)$ for time periods $t \in \{0, 1\}$, and the observed outcomes are $Y_{it} \equiv Y_{it}(T_i)$. Let the change in the outcome from the first to the second period be $\theta_i(1) \equiv Y_{i1}(1) - Y_{i0}(1)$ if unit $i$ is assigned the treatment, and $\theta_i(0) \equiv Y_{i1}(0) - Y_{i0}(0)$ if it is not. The treatment effect on unit $i$ is $\delta_i \equiv \theta_i(1) - \theta_i(0)$. Diff-in-diff assumes that $\theta_i(0) \perp T_i$ so that the change in the absence of treatment is the same for the treated units as it is for the untreated. This is called the parallel trends assumption.

Parallel trends are sufficient to identify the ATT because we can calculate $\mathbb{E}(Y_{i1} - Y_{i0} \mid T_i = 1) - \mathbb{E}(Y_{i1} - Y_{i0} \mid T_i = 0)$ from the data and it is equal to

\begin{align*} &\mathbb{E}(Y_{i1}(1) - Y_{i0}(1) \mid T_i = 1) - \mathbb{E}(Y_{i1}(0) - Y_{i0}(0) \mid T_i = 0) = \\ &\quad = \mathbb{E}(\theta_i(1) \mid T_i = 1) - \underbrace{\mathbb{E}(\theta_i(0) \mid T_i = 0)}_{= \mathbb{E}(\theta_i(0) \mid T_i = 1) \;\text{by parallel trends}} = \\ &\quad = \mathbb{E}(\theta_i(1) - \theta_i(0) \mid T_i = 1) = \\ &\quad = \mathbb{E}(\delta_i \mid T_i = 1). \end{align*}

Note that diff-in-diff does allow $\theta_i(1) \not\perp T_i$, so the change under treatment can be different for treated units than for the untreated. It also allows $Y_{i1}(1) - Y_{i1}(0) \not\perp T_i$, so the treatment effect in the second period can be different for treated units than for the untreated.

If we also assumed $\theta_i(1) \perp T_i$, then the ATE, $\mathbb{E}(\delta_i)$, would also be identified. However, in this case the ATE and the ATT would again be equal.

Finding an omitted variable that explains both the outcome and the treatment assignment is a sign that parallel trends might be violated. But this need not be the case. For example, consider $W_{it}$ as the omitted variable and suppose $W_{i0} \not\perp T_i$ and $W_{i0} \not\perp Y_{i0}(0)$. Then $W_{i0} \not\perp Y_{i1}(0) - Y_{i0}(0) = \theta_i(0)$. However, $W_{i0} \not\perp T_i$ and $W_{i0} \not\perp \theta_i(0)$ does not imply that $T_i \not\perp \theta_i(0)$, so the ATT might still be identified and diff-in-diff might still be unbiased. This would be the case if the direction of causality ran from $Y_{it}(0)$ to $W_{i0}$ and from $T_i$ to $W_{i0}$, and not the other way round.

In practice we would think about what our theory tells us about the direction of causality. If the omitted variable was a consequence of the treatment, we would not be concerned. But we would if the omitted variable influenced treatment assignment instead, or if a third factor influenced both the omitted variable and treatment assignment.