I am fairly new to econometrics and maybe this is a very basic question to some.
I am running a Fuzzy Regression Discontinuity (RD) design in Stata and I am having doubts about whether I am specifying my regressions correctly. My running variable is age and the cutoff point differs by gender.
Suppose my data is like this: Let w be a dummy variable indicative of treatment, age be my running or forcing variable (the cutoff point is equal to 25 for women and 30 for men), z is a dummy indicating whether men are over 30 or women over 25 (z_30 and z_25), and y my outcome variable. Also, let X be a vector of covariates.
I am running separate regressions for both men and women and I am doing something like this:
For men:
ivregress 2sls y age age^2 age^3 X ( w = z30 ) if male == 1, first vce (robust)
For women:
ivregress 2sls y age age^2 age^3 X ( w = z25 ) if female == 1, first vce (robust)
Is this correct?
Also, I was wondering if it were possible to run the above regressions together. I am having trouble doing this since assignment to treatment differs for women and men and I do not know how to specify the first stage equation. I have found similar cases in the literature, but none in a Fuzzy RD context.
Any help or comments regarding any of the above questions would be greatly appreciated.
Best Answer
The Fuzzy RD design can conceptualized as a local IV model (that is, an instrumental variables regression with weights that decline as observations move away from the cutoff). You need to instrument for the treated indicator with a dummy for being above the cutoff, while controlling for the running variable $Z$ and the interaction of above-the-cutoff dummy and $Z$. This can be found on page 958 of the 2nd edition of "Adult" Wooldridge. You don't have weights and you are missing these interactions in your two models.
Here's a simulation in Stata that demonstrates this equivalence. We start by installing two RD commands and making some fake data:
Here's the IV estimate. Note how you can do the interaction on the fly by using the factor variable notation. I am not using powers of $Z$ in my model, just a simple linear term:
There are two user-written commands that estimate fuzzy RD models:
The first lwald coefficient is the FRD treatment effect. Here's another command that does FRD:
The conventional coefficient above is the FRD treatment effect. Both FRD estimates and their standard errors match the LWIV.
Now for you second question. Here I may be on shakier grounds since I am less familiar with the literature. I am assuming that you want to estimate a single model for men and women to get a single estimate of the effect. There are two options to accomplish this. One is to estimate the two models and re-weight the estimates. It seems prudent not to weight by the overall gender-specific sample size, nor by the treated sample size. Personally, I like to make the weights proportional to the number of units within some range of the discontinuity for each group to make sure that the observations too far from the cutoff don't matter in determining the weights. You can use the bandwidth for that. Because the estimates from the two individual discontinuities will be independent, once you have variances for each estimate it is easy to get a variance for the combined estimate since the covariance is zero.
The other option is to re-center all the observations by gender, pool them, and then apply an estimator for a single discontinuity with the running variable now a relative one rather than an absolute one. The resulting estimate implicitly weights the various discontinuity estimates by the number of observations at the discontinuity in each case.
I think I prefer the former approach, since it allows the bandwidth to vary by gender and for any heterogeneity in the treatment effect to emerge.