Solved – Graphs in regression discontinuity design in “Stata” or “R”

data visualizationrregressionregression-discontinuitystata

Lee and Lemieux (p. 31, 2009) suggest the researcher to present the graphs while doing Regression discontinuity design analysis (RDD). They suggest the following procedure:

"…for some bandwidth $h$, and for some number of bins $K_0$ and
$K_1$ to the left and right of the cutoff value, respectively, the
idea is to construct bins ($b_k$,$b_{k+1}$], for $k = 1, . . . ,K =
K_0$+$K_1$, where $b_k = c−(K_0−k+1) \cdot h.$"

c=cutoff point or threshold value of assignment variable
h=bandwidth or window width.

…then compare the mean outcomes just to the left and right of the cutoff point…"

..in all cases, we also show the ﬁtted values from a quartic regression model estimated separately on each side of the cutoff point…(p. 34 of the same paper)

My question is how do we program that procedure in Stata or R for plotting the graphs of outcome variable against assignment variable (with confidence intervals) for the sharp RDD.. A sample example in Stata is mentioned here and here (replace rd with rd_obs) and a sample example in R is here. However, I think both of these didn't implement the step 1. Note, that both have the raw data along with the fitted lines in the plots.

Sample graph without confidence variable [Lee and Lemieux,2009] enter image description here
Thank you in advance.

Best Answer

Is this much different from doing two local polynomials of degree 2, one for below the threshold and one for above with smooth at $K_i$ points? Here's an example with Stata:

use votex // the election-spending data that comes with rd

tw 
(scatter lne d, mcolor(gs10) msize(tiny)) 
(lpolyci lne d if d<0, bw(0.05) deg(2) n(100) fcolor(none)) 
(lpolyci lne d if d>=0, bw(0.05) deg(2) n(100) fcolor(none)), xline(0)  legend(off)

Alternatively, you can just save the lpoly smoothed values and standard errors as variables instead of using twoway. Below $x$ is the bin, $s$ is the smoothed mean, $se$ is the standard error, and $ul$ and $ll$ are the upper and lower limits of the 95% Confidence Interval for the smoothed outcome.

lpoly lne d if d<0, bw(0.05) deg(2) n(100) gen(x0 s0) ci se(se0)
lpoly lne d if d>=0, bw(0.05) deg(2) n(100) gen(x1 s1) ci se(se1)

/* Get the 95% CIs */
forvalues v=0/1 {
    gen ul`v' = s`v' + 1.95*se`v' 
    gen ll`v' = s`v' - 1.95*se`v' 
};

tw 
(line ul0 ll0 s0 x0, lcolor(blue blue blue) lpattern(dash dash solid)) 
(line ul1 ll1 s1 x1, lcolor(red red red) lpattern(dash dash solid)), legend(off)

As you can see, the lines in the first plot are the same as in the second.

Related Solutions

Solved – Fuzzy regression discontinuity design in Stata

This is partial answer. I think you should probably use both the biprobit and the ivreg/ivreg2 commands to check how robust your effects are. I like the biprobit approach given your data, but it does make some strong assumptions (no heteroskedasticity, no hetrogenous effects, normality of errors).* However, there's also a dedicated RD command in Stata called rdrobust. It can handle the fuzzy design and may be installed with:

net install rdrobust, from(http://www-personal.umich.edu/~cattaneo/rdrobust) replace

You can find an intro to the command in Cattaneo, Calonico, and Titiunik's Stata Journal paper Robust Data-Driven Inference in the Regression-Discontinuity Design.

*Austin Nichols' simulation results indicate that the marginal effects may be less sensitive than the latent index function parameters to biprobit assumption violations. The LPM model is also not always the model of steel that A&P make it out.

Solved – Regression discontinuity design versus panel cointegration

From your question it seems like you want to estimate the effect of a treatment variable on some outcome variable. If that is indeed the case, a cointegration analysis won't do you much good. Here's why: You say that your treatment variable is binary variable, so I take it it takes the value 1 if an individual was treated and 0 otherwise. You are correct in your hesitation regarding the unit root testing of that variable; it's not meaningful. Think especially in terms of cointegration and how it is defined.

We say that two processes, $X_t$ and $Y_t$, say, are cointegrated when both are integrated of some order larger than zero but there exist a linear combination of them that is integrated of a lower order. But if one of them is constant over time, cointegration not possible since a constant is trivially stationary. Each treatment dummy in your data (one for each cross-section dimension / observed individual) is indeed constant over time.

With that in mind, RDD seems like the better choice. However, RDD is not always implementable (you need some sort of discontinuity to exploit, for one) so in general you cannot have a rule saying "either I do cointegration testing or exploit an RDD design". It all depends on the data you have or can get hold of. This is also what I mean in my comment: if you want advice on which approach to use, you have to give some details about what data you have and which question are you trying to answer.

Edit:

In response to your comment: the party membership of the governor cannot be cointegrated with the level of environmental expenditure because it cannot be integrated of any order $p>0$. Even if the value changes over time AND between individuals, the variable only takes two values and, thus, cannot include a stochastic trend. In order to be cointegrated with the dependent variable, they must share the same stochastic trend, but they cannot possibly do that, then.

Best Answer

Related Solutions

Solved – Fuzzy regression discontinuity design in Stata

Solved – Regression discontinuity design versus panel cointegration

Related Question