Solved – Fuzzy regression discontinuity design and exclusion restriction

causalityeconometricsinstrumental-variablesleast squaresregression-discontinuity

In a fuzzy regression discontinuity design, what does the exclusion restriction look like in terms of a conditional expectation between the instrument in the first stage and the error term in the structural equation?

In IV, we normally say that the first-stage must produce a non-zero estimate (and preferably one that is sufficiently statistically significant to avoid weak instruments). However, in RD, we modify that to say that the continuity of the first-stage must be non-zero. That is, we evaluate it in the limit. The same for the errors. In the limit, the difference in the conditional expectation coming from the left relative to coming from the right around the discontinuity must equal zero?

How about the exclusion restriction? Is that also something that must be a continuous condition in that its difference is evaluated from both the left and right in the limit, and the resulting difference should equal zero?

Best Answer

I am not a fan of the Angrist and Pischke book, but they do have a flair for phrasing, and as they say, fuzzy RD is IV (Sec. 6.2). This fact is obscured by the fact that the instrument is essentially a nonlinear transformation (step function) of one of the included exogenous variables, which by virtue of the conditional exogeneity assumption, is a valid instrument.


Assume that each subject is characterized by the tuple of random variables, $\{Y_{0i}, Y_{1i}, D_i, X_i\}$, where $Y_{0i}$ and $Y_{1i}$ are the potential outcomes under non-treatment and treatment respectively, $D_i$ is an indicator variable of whether treatment is administered (which governs which of the potential outcomes is observed for a subject), and $X_i$ is the so-called forcing variable which deterministically or stochastically determines treatment. Usually, the fuzzy RD [FRD] model is stated as the rather concise set of specifications $$ \begin{align} \lim_{x\downarrow x_0} \mathbb{E}(D_i\mid X_i = x) &\neq \lim_{x\uparrow x_0} \mathbb{E}(D_i\mid X_i = x)\\ \lim_{x\downarrow x_0} \mathbb{E}(Y_{0i}\mid X_i = x) &= \lim_{x\uparrow x_0} \mathbb{E}(Y_{0i}\mid X_i = x)\\ \end{align} $$ which are intuitively transparent, but are hard to work with.

Potential outcomes framework

We can use the familiar potential outcomes model to unpack these specifications, where, for the simplicity of exposition, we exclude all other exogenous variables, other than the forcing variable, $X_i$, which deterministically (in the case of RDD) or stochastically (in the case of FRD) determines the treatment assignment ($D_i=1$). The conditional mean of the outcome in terms of the observable variables is given by

$$ \begin{align} \mathbb{E}(Y_i \mid X_i, D_i) &= \mathbb{E}(Y_{0i}\mid X_i, D_i) + D_i\left(\mathbb{E}(Y_{1i}\mid X_i, D_i)-\mathbb{E}(Y_{0i}\mid X_i, D_i)\right) \\ \end{align} $$ Here we make no parametric assumptions about the form of the conditional expectation functions. Note that all of these specifications are restricted to the locality of $x_0$, that is $X_i\in [x_0-\Delta_n, x_0+\Delta_n]$, where the indexing by the sample size is for pragmatic reasons (it becomes relevant when we define the estimator).

Recall that in the sharp RD case, we can write $D_i=\mathbf{1}_{[X_i\geq x_0]}$, where $x_0$ is the point of discontinuity. In the FRD case, this relationship is no longer deterministic, instead we have that the conditional mean is modelled in terms of the discontinuity

$$ \begin{align} \mathbb{E}(D_i\mid X_i) &= \mathbb{P}\left[D_i=1\mid X_i\right]\\ &=(1-\mathbf{1}_{[X_i\geq x_0]})\mathbb{P}\left[D_i=1\mid X_i< x_0\right] + \mathbf{1}_{[X_i\geq x_0]}\mathbb{P}\left[D_i=1\mid X_i\geq x_0\right] \end{align} $$ Note that since $X_i$ is exogenous in the system, so is the random variable $\mathbf{1}_{[X_\geq x_0]}$ -- it acts as the excluded exogenous variable in the specification of the conditinal mean of the endogenous variable $D_i$.

Estimation

This is then a valid just-identified IV model, with one endogenous variable $D_i$, and one excluded exogenous variable $\mathbf{1}_{[X_i\geq x_0]}$. A direct and general estimator with no further parametric assumptions is the nonparametric Wald estimator.

$$ \dfrac{\widehat{\mathbb{E}}\left(Y_i \mid x_0 \leq X_i\leq x_0+ \Delta_n \right)-\widehat{\mathbb{E}}\left(Y_i \mid x_0- \Delta_n \leq X_i< x_0\right)}{\widehat{\mathbb{P}}\left[D_i=1\mid x_0 \leq X_i\leq x_0+ \Delta_n \right]-\widehat{\mathbb{P}}\left[D_i=1\mid x_0- \Delta_n \leq X_i< x_0\right]} $$

Typically local smoothers, like the local linear smoother are used to estimate the conditional mean functions.

ATE interpretation

Note that in order to interpret the given estimator as the average treatment effect [ATE] in the locality of $x_0$, we have used the implausible but routine conditional (on $X_i$) independence of $D_i$ and $Y_{1i}-Y_{0i}$. This allows us to remove the conditioning on $D_i$ in the conditional mean function of the outcome in a mathematically convenient way. For more details, see Hahn, Todd & van der Klauuw (2001), which is an excellent and readable reference for RD models. They also provide interpretations of the parameter being estimated under weaker assumptions.