Solved – Bins in Regression Discontinuity Designs

binningmultiple regressionregression-discontinuity

Lee and Lemieux (p. 31, 2009) suggest the researcher to also present graphs while doing Regression discontinuity design analysis. They suggest the following procedure:

"…for some bandwidth $h$, and for some number of bins $K_0$ and
$K_1$ to the left and right of the cutoff value, respectively, the
idea is to construct bins ($b_k$,$b_{k+1}$], for $k = 1, . . . ,K =
K_0$+$K_1$, where $b_k = c−(K_0−k+1) \cdot h.$"

c=cutoff point or threshold value of assignment variable
h=bin width.

They then calculate mean values of the outcome within bins and compare the mean outcomes just to the left and right of the cutoff point.

My question is, whether we always should use a fixed bin width $h$. Put differently, would it be legitimate to bin the data such that the number of observations is constant within each bin. The reason for my question is that some parts of my forcing variable are sparely populated, resulting in a noisy graph.

Best Answer

You are making a very good point: in fact, there is a paper by Calonico, Cattaneo, Titiunik (2015a) making the same point and discussing binwidth selectors for quantile-spaced plots.

The paper is a little technical, so you might want to look instead at their R journal paper, see Calonico, Cattaneo and Titiunik (2015b).

You might also want to have a look at the RDD interactive plot online tool: http://shiny.qua.st/rddtools/ which should soon allow also to change the binwidth interactively, but no quantile-spaced plots for the moment.

Refs: