Solved – Fixed effects in regression discontinuity design

causalityeconometricsfixed-effects-modelregression-discontinuity

I want to do a non parametric RDD type analysis to know the impact of an intervention (a single dummy variable) on an outcome variable. I have several 'boundaries' (which are actually different geographical locations) around which I will be picking observations.

Consider that I have only 2 observations at each boundary- one with the intervention and one without it. I can assume that all other relevant variables are the same for the pair of observations at each boundary.

Can I then regress the outcome variable on the intervention dummy along with boundary fixed effects? My main concern is regarding use of fixed effects with just 2 observations at each boundary — I am not sure if that is legit.

Best Answer

For reasons explained in my comment, you will get identical estimates for the treatment coefficient.

Here's a numerical example of "hard" RDD using Stata. We will use a experimental dataset of 12 cars. Each car was run once without a beneficial fuel additive (condition 1) and once with (condition 2). The outcome is miles per gallon. This setup is similar to your cross-border pairs, where one member is treated, but they are otherwise similar.

. use http://www.stata-press.com/data/r14/fuel, clear

. gen diff = mpg2-mpg1

. list, clean noobs

    mpg1   mpg2   diff  
      20     24      4  
      23     25      2  
      21     21      0  
      25     22     -3  
      18     23      5  
      17     18      1  
      18     17     -1  
      24     28      4  
      20     24      4  
      24     27      3  
      23     21     -2  
      19     23      4  

. reg diff

      Source |       SS           df       MS      Number of obs   =        12
-------------+----------------------------------   F(0, 11)        =      0.00
       Model |           0         0           .   Prob > F        =         .
    Residual |       80.25        11  7.29545455   R-squared       =    0.0000
-------------+----------------------------------   Adj R-squared   =    0.0000
       Total |       80.25        11  7.29545455   Root MSE        =     2.701

------------------------------------------------------------------------------
        diff |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |       1.75   .7797144     2.24   0.046     .0338602     3.46614
------------------------------------------------------------------------------

. gen pair_id = _n

. reshape long mpg, i(pair_id) j(treat)
(note: j = 1 2)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                       12   ->      24
Number of variables                   4   ->       4
j variable (2 values)                     ->   treat
xij variables:
                              mpg1 mpg2   ->   mpg
-----------------------------------------------------------------------------

. reg mpg i.treat  i.pair_id 

      Source |       SS           df       MS      Number of obs   =        24
-------------+----------------------------------   F(12, 11)       =      4.03
       Model |       176.5        12  14.7083333   Prob > F        =    0.0139
    Residual |      40.125        11  3.64772727   R-squared       =    0.8148
-------------+----------------------------------   Adj R-squared   =    0.6127
       Total |     216.625        23  9.41847826   Root MSE        =    1.9099

------------------------------------------------------------------------------
         mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     2.treat |       1.75   .7797144     2.24   0.046     .0338602     3.46614
             |
     pair_id |
          2  |          2   1.909902     1.05   0.317    -2.203667    6.203667
          3  |         -1   1.909902    -0.52   0.611    -5.203667    3.203667
          4  |        1.5   1.909902     0.79   0.449    -2.703667    5.703667
          5  |       -1.5   1.909902    -0.79   0.449    -5.703667    2.703667
          6  |       -4.5   1.909902    -2.36   0.038    -8.703667   -.2963331
          7  |       -4.5   1.909902    -2.36   0.038    -8.703667   -.2963331
          8  |          4   1.909902     2.09   0.060    -.2036669    8.203667
          9  |   1.03e-15   1.909902     0.00   1.000    -4.203667    4.203667
         10  |        3.5   1.909902     1.83   0.094    -.7036669    7.703667
         11  |   1.15e-15   1.909902     0.00   1.000    -4.203667    4.203667
         12  |         -1   1.909902    -0.52   0.611    -5.203667    3.203667
             |
       _cons |     21.125    1.40565    15.03   0.000     18.03118    24.21882
------------------------------------------------------------------------------

. xtreg mpg i.treat, fe

    Fixed-effects (within) regression               Number of obs     =         24
    Group variable: pair_id                         Number of groups  =         12

    R-sq:                                           Obs per group:
         within  = 0.3141                                         min =          2
         between =      .                                         avg =        2.0
         overall = 0.0848                                         max =          2

                                                    F(1,11)           =       5.04
    corr(u_i, Xb)  = 0.0000                         Prob > F          =     0.0463

    ------------------------------------------------------------------------------
             mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
         2.treat |       1.75   .7797144     2.24   0.046     .0338602     3.46614
           _cons |         21   .5513413    38.09   0.000     19.78651    22.21349
    -------------+----------------------------------------------------------------
         sigma_u |  2.6809513
         sigma_e |  1.9099024
             rho |  .66334557   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    F test that all u_i=0: F(11, 11) = 3.94                      Prob > F = 0.015

All 3 methods yield an estimated marginal improvement of 1.75 miles per gallon, with a standard error of 0.78.

Related Solutions

Solved – How dumthe variable can be analyzed in panel data using fixed effects in Stata

Do your dummy varibales vary over time or are they constant within each panel? In a fixed-effects model only time-varying variables can be used, the time invariant variables are droppped. This should be explained in your favourite (econometrics) textbook.

If you are interested in these invariant variables, you have two possibilities. If you think that the assumptions for a random effects model hold, you can go that way. In Stata, this kind of model is estimated by xtreg with the re option:

xtreg y x1 x2 x3, re

If you are not happy with the assumptions of the random effects model, you can also have a look at the Hausmann-Taylor estimator. In Stata, this estimator is implemented in the xthtaylor command.

Solved – Graphs in regression discontinuity design in “Stata” or “R”

Is this much different from doing two local polynomials of degree 2, one for below the threshold and one for above with smooth at $K_i$ points? Here's an example with Stata:

use votex // the election-spending data that comes with rd

tw 
(scatter lne d, mcolor(gs10) msize(tiny)) 
(lpolyci lne d if d<0, bw(0.05) deg(2) n(100) fcolor(none)) 
(lpolyci lne d if d>=0, bw(0.05) deg(2) n(100) fcolor(none)), xline(0)  legend(off)

Alternatively, you can just save the lpoly smoothed values and standard errors as variables instead of using twoway. Below $x$ is the bin, $s$ is the smoothed mean, $se$ is the standard error, and $ul$ and $ll$ are the upper and lower limits of the 95% Confidence Interval for the smoothed outcome.

lpoly lne d if d<0, bw(0.05) deg(2) n(100) gen(x0 s0) ci se(se0)
lpoly lne d if d>=0, bw(0.05) deg(2) n(100) gen(x1 s1) ci se(se1)

/* Get the 95% CIs */
forvalues v=0/1 {
    gen ul`v' = s`v' + 1.95*se`v' 
    gen ll`v' = s`v' - 1.95*se`v' 
};

tw 
(line ul0 ll0 s0 x0, lcolor(blue blue blue) lpattern(dash dash solid)) 
(line ul1 ll1 s1 x1, lcolor(red red red) lpattern(dash dash solid)), legend(off)

As you can see, the lines in the first plot are the same as in the second.

Best Answer

Related Solutions

Solved – How dumthe variable can be analyzed in panel data using fixed effects in Stata

Solved – Graphs in regression discontinuity design in “Stata” or “R”

Related Question