Solved – Graphs in regression discontinuity design in “Stata” or “R”

data visualizationrregressionregression-discontinuitystata

Lee and Lemieux (p. 31, 2009) suggest the researcher to present the graphs while doing Regression discontinuity design analysis (RDD). They suggest the following procedure:

"…for some bandwidth $h$, and for some number of bins $K_0$ and
$K_1$ to the left and right of the cutoff value, respectively, the
idea is to construct bins ($b_k$,$b_{k+1}$], for $k = 1, . . . ,K =
K_0$+$K_1$, where $b_k = c−(K_0−k+1) \cdot h.$"

c=cutoff point or threshold value of assignment variable
h=bandwidth or window width.

…then compare the mean outcomes just to the left and right of the cutoff point…"

..in all cases, we also show the fitted values from a quartic regression model estimated separately on each side of the cutoff point…(p. 34 of the same paper)

My question is how do we program that procedure in Stata or R for plotting the graphs of outcome variable against assignment variable (with confidence intervals) for the sharp RDD.. A sample example in Stata is mentioned here and here (replace rd with rd_obs) and a sample example in R is here. However, I think both of these didn't implement the step 1. Note, that both have the raw data along with the fitted lines in the plots.

Sample graph without confidence variable [Lee and Lemieux,2009] enter image description here
Thank you in advance.

Best Answer

Is this much different from doing two local polynomials of degree 2, one for below the threshold and one for above with smooth at $K_i$ points? Here's an example with Stata:

use votex // the election-spending data that comes with rd

tw 
(scatter lne d, mcolor(gs10) msize(tiny)) 
(lpolyci lne d if d<0, bw(0.05) deg(2) n(100) fcolor(none)) 
(lpolyci lne d if d>=0, bw(0.05) deg(2) n(100) fcolor(none)), xline(0)  legend(off)

Alternatively, you can just save the lpoly smoothed values and standard errors as variables instead of using twoway. Below $x$ is the bin, $s$ is the smoothed mean, $se$ is the standard error, and $ul$ and $ll$ are the upper and lower limits of the 95% Confidence Interval for the smoothed outcome.

lpoly lne d if d<0, bw(0.05) deg(2) n(100) gen(x0 s0) ci se(se0)
lpoly lne d if d>=0, bw(0.05) deg(2) n(100) gen(x1 s1) ci se(se1)

/* Get the 95% CIs */
forvalues v=0/1 {
    gen ul`v' = s`v' + 1.95*se`v' 
    gen ll`v' = s`v' - 1.95*se`v' 
};

tw 
(line ul0 ll0 s0 x0, lcolor(blue blue blue) lpattern(dash dash solid)) 
(line ul1 ll1 s1 x1, lcolor(red red red) lpattern(dash dash solid)), legend(off)  

As you can see, the lines in the first plot are the same as in the second.