Solved – Regression discontinuity versus matching with spatial discontinuity

regression-discontinuityspatial

I am interested in using a spatial research design. Imagine a line, like a time zone line. For example, in the United States, the line that makes between Eastern Standard Time and Central Standard Time runs North to South through the U.S. (and other places), more or less.

Suppose the United States implemented a policy where, with respect to county boundaries, all counties west of the 75th longitude were to get a tax deduction. No counties to the right of it would. I want to evaluate the impacts of the this tax deduction on crime.

For the sake of argument, can we please assume that the discontinuity is valid. I am trying to understand this idea for spatial discontinuities, in general?

I can imagine proceeding one of two ways. One way would a regression discontinuity. Exploiting the distance of say a county from this line. The other way would be a matching of counties that straddle this line on opposite sides. Note that a county might appear more than once if it borders more than one county on the opposite side of the time zone line.

The regression discontinuity includes a larger sample because it may include counties that are not exclusively along the border. The matching hones in on counties that touch each other, whereas the regression discontinuity, at best, can partition the time zone into smaller segmented lines where segments likely include multiple pairs of counties along the time zone.

What are the trade-offs of the research design? Which is preferred? Does one offer something that the other doesn't?

Best Answer

I would say matching is more appropriate (and easier), but the logic to each has some comparable aspects worth expounding upon.

Regression discontinuity designs are predicated on the fact that there is some observable relationship between some variable, $X$, and the outcome, $Y$. Then in RDD there is some other exogenous impact that occurs at some threshold of $X$. Note, implicit in the design is that cases are comparable on each side of the threshold (that is, no other differences between the cases exist on each side of the threshold), and so any discontinuity in the effect of $X$ and $Y$ before and after that threshold can be considered the treatment effect.

One of the things that makes RDD in this circumstance difficult is that it is unclear what $X$ is in your circumstance (you could think of many in addition to the distance one you mentioned) and for the social science variables listed, it is unlikely $X$ has a clear/obvious/strong relationship to $Y$. Also I would be skeptical that cases on either side of the threshold are entirely comparable, and so one would want to include other socio-demographic indicators. This can be done, but makes such a quasi-experiment markedly less appealing.

Thus I would suggest matching or estimating propensity score models. You can certainly find a history of examples of matching across the border (see for instance Card & Kreuger, 1994). Also I have seen matching spatial units extended to propensity score models, for instance Ridgeway (2006) uses a flexible set of generalized boosted models to estimate propensity scores for post traffic stop outcomes (e.g. searches, arrests). Such flexible models are attractive because spatial trends can be hard to characterize with such social science data and may take many parameters to effectively model. Also such models are readily capable of including other sets of socio-demographic covariates one would be expected to include in such research designs (for at least the outcomes you mention).


Citations

Related Question