Difference-in-Differences – Notation for Leads and Lags in Difference-in-Differences Analysis

difference-in-differenceeconometricsnotationregressionsum

I was hoping someone could help clarify a notational discrepancy.

For example, Lord Pischke uses the following sigma notation in two different lecture notes published on the web, yet refers to the limits in disparate ways. The first (Version 1 see page 7) is reproduced below:

$$
y_{ist} = \gamma_s + \lambda_{t} + \sum_{j=-m}^{q}{\beta_{j}} D_{st}(t = k + j) + X_{ist} \delta + \epsilon_{ist}.
$$

The second (Version 2 see slide 9) is also reproduced below:

$$
y_{ist} = \gamma_s + \lambda_{t} + \sum_{j=-m}^{q}{\beta_{j}} D_{st+j} + \epsilon_{ist}.
$$

The structure is the same. The $k$ in the former equation is the time at which treatment is switched on in state $s$. This formulation can generalize to any number of leads or lags of the treatment variable. Referring to the former equation, Pischke indicates that $m$ is the lead and $q$ is the lag. While in the latter specification his lecture notes report the opposite, indicating $m$ is the lag and $q$ is the lead. I believe the latter interpretation is correct. To illustrate, $D_{s,t-1}$ is the treatment variable lagged by one period. I am not sure if this is a mistake on his part.

The impetus for this question stems from another post (see below) where Andy recommended interacting pre-intervention time dummies with a treatment indicator. Negative subscripts could be used to indicate months leading up to a policy change, and do not necessary indicate "lags" in the traditional sense.

For example, assuming we have an arbitrary number of $k$ periods before some policy change, then $\sum_{t=-k}^{0}{\beta_{t}}$ is a parsimonious way of indicating we want to estimate separate dummies for each time period approaching the baseline (i.e., $t = 0$).

Questions:

  1. Is the interpretive discrepancy between the two equations an oversight on Pischke's part?
  2. Is a negative in the lower limit of a summation always indicative of a lag, or could it be used to denote pre-period time dummies?

Difference in Difference method: how to test for assumption of common trend between treatment and control group?

Best Answer

$m$ is the lag (post-treatment effects) and $q$ is the lead (anticipatory effects) in the Mostly Harmless book. I have always found this lead/lag terminology unnatural, but lead makes sense if you think of it as a kind of leading indicator. Lag makes sense if you think of the effect lagging treatment.

The time dummies are the $\lambda$s. You can think of D as the set of "ever treated" x time dummies interactions (though possibly not all the possible interactions).