Random Effects Model in R with PLM – How to Replicate Results in Stata

plmrrandom-effects-modelstata

I have been working on migrating a current project from Stata to R, where I have encountered difficulties with differing results of random effects regressions.

I have panel data from an experiment where the treatment dummy is perfectly correlated with the group indicator because it is time-invariant. This means that a fixed effects regression of the outcome variable on the treatment dummy is not possible – however, a random-effects regression should be, since it only partially time-demeans the data. I am willing to assume that the treatment dummy and other covariates are not correlated with the group-specific error.

In Stata, this worked without a problem. The random-effects regression of the continuous outcome variable on the treatment dummy gives a result that makes sense, and the fixed effects regression omits the treatment dummy, exactly as expected.

However, in R, using the plm package, it did not work. I have received the error message "empty model." Curiously, this is not the case if the model does not include the treatment-dummy but other variables as regressors that are not perfectly correlated with the group indicator. In this case, plm's default method "swar" gives the same results as Stata.

I have tried to use other methods that are supplied by plm, and only the "walhus" method does work. In the case of a regression with the treatment dummy as a covariate, this gives the same result on the coefficients as Stata. However, it gives different results for models without the treatment dummy. These differences are not huge but considerable.

So in conclusion, I am able to replicate Stata's results in R, but with different methods where Stata uses only one. I have not found an explanation for that behavior in the Stata Documentation or in the plm paper in the Journal of Statistical Software. The plm paper gives sources for the different methods for RE (that supposedly differ in their estimation of theta) but does not explain the differences itself. The original sources for "swar" and "walhus" are Econometrica papers from the late 60s / early 70s. Quite frankly, I was not able to find a solution in these either. I have also found this question on Stackexchange, but I believe that this is a different issue.

Any help or ideas would be much appreciated! This has already taken an immense ammount of time and I find it to be really troubling.

P.S. I cannot share the original data, but I have created a dataset with similar properties with which these problems can be replicated. I have put it into a dropbox, as .Rdata and .dta.

The "original" Stata code:

xtset GroupID Round


xtreg outcome Treatment, re
------------------------------------------------------------------------------
     outcome |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   Treatment |   36.93656    5.97516     6.18   0.000     25.22546    48.64766
       _cons |   51.16955   4.225076    12.11   0.000     42.88855    59.45055
-------------+----------------------------------------------------------------


xtreg outcome X1, re
------------------------------------------------------------------------------
     outcome |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          X1 |  -.0278302   .1193763    -0.23   0.816    -.2618033     .206143
       _cons |   70.84536   6.953707    10.19   0.000     57.21635    84.47438
-------------+----------------------------------------------------------------

The corresponding R-code:

library(plm)
testdata <- pdata.frame(testdata, index=c("GroupID","Round"))


Model1 <- plm(outcome ~ Treatment, data = testdata, model="random", random.method="swar") 
summary(Model1) # This doesn’t work
Error in plm.fit(data, model = models[1], effect = effect) : empty model


Model2 <- plm(outcome ~ Treatment, data = testdata, model="random", random.method="walhus") 
summary(Model2) # This gives the same results as Stata
            Estimate Std. Error z-value  Pr(>|z|)    
(Intercept)  51.1695     4.2251 12.1109 < 2.2e-16 ***
Treatment    36.9366     5.9752  6.1817 6.342e-10 ***


Model3 <- plm(outcome ~ X1, data = testdata, model="random", random.method="swar")
summary(Model3) # This gives the same results as Stata
            Estimate Std. Error z-value Pr(>|z|)    
(Intercept) 70.84536    6.95371 10.1881   <2e-16 ***
X1          -0.02783    0.11938 -0.2331   0.8157    


Model4 <- plm(outcome ~ X1, data = testdata, model="random", random.method="walhus")
summary(Model4) # This gives slightly different results than Stata
             Estimate Std. Error z-value Pr(>|z|)    
(Intercept) 70.682277   7.003460 10.0925   <2e-16 ***
X1          -0.024072   0.119074 -0.2022   0.8398

EDIT: I have tried something else and found that plm's default method "swar" does also work for a model that includes both the time-invariant treatment-dummy and a time-varying continuous covariate:

Model1.2 <- plm(outcome ~ Treatment + X1, data = testdata, model="random", random.method="swar")
summary(Model1.2) # This somehow works
             Estimate Std. Error z-value  Pr(>|z|)    
(Intercept) 14.906599  11.284649  1.3210    0.1865    
Treatment   36.835123   6.075290  6.0631 1.335e-09 ***
X1          -0.012018   0.108785 -0.1105    0.9120

This gives the same results on the coefficients (but not the intercept) as Stata:

xtreg outcome Treatment X1, re
------------------------------------------------------------------------------
     outcome |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   Treatment |   36.83512    6.07529     6.06   0.000     24.92777    48.74247
          X1 |   -.012018   .1087849    -0.11   0.912    -.2252326    .2011965
       _cons |   51.74172   6.697543     7.73   0.000     38.61478    64.86866
-------------+----------------------------------------------------------------

Best Answer

While the question seems as a software question at first, there is some statistics behind it (and, thus, I deem this to be on-topic for xvalidated):

The random effect estimator as per Swamy-Arora uses the variation of the associated within model and the associated between model. For a plm-based exposition see one of the package's vignettes https://cran.rstudio.com/web/packages/plm/vignettes/B_plmFunction.html, section "Unbalanced panels" (but this is not specific to unbalances panels). Any good text book about panel models will cover this, e.g., Wooldridge or Baltagi. Other random effect estimators like Wallace-Hussain use slightly other "base models" (but Amemiya's estimator uses the within model twice), see Baltagi's text book for an overview.

Now, looking at the software implementation for plm if model = "swar": The function estimates a within model first. This fails (correctly) for the specific example you have as there is no within variation of the only covariate (Treatment, as you observed correctly). The function then does not continue to estimate the between model. Stata does continue for these data (and also gretl) and gives an output. Thus, the model you want to estimate is equivalent to the between specification. The between model can be estimated by:

plm(outcome ~ Treatment, data = testdata, model = "between") 

# Coefficients:
#              Estimate Std. Error t-value  Pr(>|t|)    
# (Intercept)  51.1695     3.7313 13.7135 5.722e-11 ***
# Treatment    36.9366     5.2769  6.9997 1.555e-06 ***

-> You get the same estimates for the coefficient as Stata gives (the difference for the standard errors is due to some adjustment either specific to RE models or to Stata, I suppose. Also, for RE models z values are usually applied as the finite distribution is usually not known).

Related Solutions

Solved – group fixed-effects, not individual-fixed effects using plm in R

I have worked on similar projects and am confronting one right now. The way that we handle this is to put in a fixed effect for each village and then to cluster the standard errors by village. This is not a perfect solution, but is fairly standard practice.

The plm package in R and xtreg ..., fe command in Stata, and the traditional fixed effect (within) estimator are designed to follow individuals. I believe one of the names for the method that you want is called a hierarchical linear model.

The simplest implementation in R would be something like

myLM <- lm(y ~ x + v v.t*t, data=df)

where y is the outcome of interest, x is some set of controls, v is a factor variable for the villages, v.t is a binary (factor) variable indicating whether a village was treated, and t is an indicator for pre-post treatment.

For standard inference, it is typical and recommended to produce clustered standard errors use either the multiwayvcov package or clusterSEs package.

Another method for inference, and the preferred method in Bertrand, Duflo & Mullainathan, 2004 is to perform a placebo test, where you vary "treatment" across all villages, form an empirical CDF, and see where the effect of treatment for the truly treated village sits in that distribution. Note that this is roughly the same method recommended for inference with synthetic controls of Abadie, Diamond, and Hainmueller, and has ties back to Fisher's 1935 text.

Stata – Differences in Differences, Fixed Effects, and Standard Errors Explained

You need to compare apples to apples, so use clustering with OLS and clustering with xtreg, fe (or robust with xtreg, fe, which will default to clustering as Thomas pointed out). These coefficient equivalences are limited to two-period (one pre, and one post) datasets with treatment at the same time for all treated units.

Here's an example of 2x2 DID on a public dataset demonstrating this. Here NJ restaurants are treated (become subject to the minimum wage increase) and PA restaurant are not. February '92 (t=0) is pre and November '92 is post (t=1). The DID parameter is the interaction of t = 1 and NJ = 1. The outcome fte is full-time equivalent employees. Here I will balance the panel in order to get xtreg, fe and OLS to give the same coefficient estimates. If the panel is unbalanced (consists of repeated cross-sections), xtreg, fe will drop some observations that appear in only one year and the estimates will no longer match OLS or manual calculations. You may want to stick with clustered OLS if you have a repeated cross-section.

Here is the result. Note that you can use factor variable notation to create the interactions rather than hard coding them.

. use http://fmwww.bc.edu/repec/bocode/c/CardKrueger1994.dta, clear
(Dataset from Card&Krueger (1994))

. drop if id == 407 // duplicate restaurant
(4 observations deleted)

. drop if missing(fte, treated, t, id)
(19 observations deleted)

. bysort id: keep if _N==2 // balance the panel
(19 observations deleted)

. xtset id t
       panel variable:  id (strongly balanced)
        time variable:  t, 0 to 1
                delta:  1 unit

. 
. /* calculate DID by hand */
. table treated t, c(mean fte N fte) row col

----------------------------------------
New       |
Jersey =  |
1;        |
Pennsylva | Feb. 1992 = 0; Nov. 1992 = 1
nia = 0   |        0         1     Total
----------+-----------------------------
       PA | 20.17333     17.65  18.91167
          |       75        75       150
          | 
       NJ | 17.06927  17.51831  17.29379
          |      314       314       628
          | 
    Total | 17.66774   17.5437  17.60572
          |      389       389       778
----------------------------------------

. di %9.3f (17.51831 - 17.06927) - (17.65 - 20.17333)
    2.972

. 
. /* fit models */
. eststo ols_robust:   qui   reg fte i.treated##i.t, robust

. eststo xtreg_robust: qui xtreg fte i.treated##i.t, fe robust

. eststo xtreg_clust:  qui xtreg fte i.treated##i.t, fe cluster(id)

. eststo ols_clust:    qui   reg fte i.treated##i.t, cluster(id)

. 
. capture ssc install estout

. esttab *, se(%9.7f) noomitted drop(0.treated 0.t 0.treated#0.t) modelwidt(15) mtitles label varwidth(35)

---------------------------------------------------------------------------------------------------------------
                                                (1)                (2)                (3)                (4)   
                                         ols_robust       xtreg_robust        xtreg_clust          ols_clust   
---------------------------------------------------------------------------------------------------------------
NJ                                           -3.104*                                                  -3.104*  
                                        (1.4475664)                                              (1.4484988)   

Feb. 1992 = 0; Nov. 1992 = 1=1               -2.523             -2.523*            -2.523*            -2.523*  
                                        (1.6371048)        (1.2498119)        (1.2498119)        (1.2506190)   

NJ # Feb. 1992 = 0; Nov. 1992 = 1=1           2.972              2.972*             2.972*             2.972*  
                                        (1.7822146)        (1.3337493)        (1.3337493)        (1.3346107)   

Constant                                      20.17***           17.67***           17.67***           20.17***
                                        (1.3591695)        (0.2232501)        (0.2232501)        (1.3600450)   
---------------------------------------------------------------------------------------------------------------
Observations                                    778                778                778                778   
---------------------------------------------------------------------------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001

Clustering in DID settings is a good idea for reasons outlined in Bertrand, Duflo, and Mullainathan's 2004 QJE paper. Clustering at the level of treatment is also a good idea, but here that is not feasible since there are not enough clusters (since treatment is a state law and we have data from two states only) for that to work well. Generally your SEs will go up when you cluster in DID, but if the errors are negatively correlated within cluster, they might shrink. See this post for the reasons why.

Code:

estimates clear
cls
use http://fmwww.bc.edu/repec/bocode/c/CardKrueger1994.dta, clear
drop if id == 407 // duplicate restaurant
drop if missing(fte, treated, t, id)
bysort id: keep if _N==2 // balance the panel
xtset id t

/* calculate DID by hand */
table treated t, c(mean fte N fte) row col
di %9.3f (17.51831 - 17.06927) - (17.65 - 20.17333)

/* fit models */
eststo ols_robust:   qui   reg fte i.treated##i.t, robust
eststo xtreg_robust: qui xtreg fte i.treated##i.t, fe robust
eststo xtreg_clust:  qui xtreg fte i.treated##i.t, fe cluster(id)
eststo ols_clust:    qui   reg fte i.treated##i.t, cluster(id)

capture ssc install estout
esttab *, se(%9.7f) noomitted drop(0.treated 0.t 0.treated#0.t) modelwidt(15) mtitles label varwidth(35)