Solved – Estimating robust standard errors in panel data regressions

panel dataplmrrobust-standard-errorstandard error

I am trying to estimate robust standard errors in a panel data regression. I understand panel data regressions conceptually, but R offers a lot of options I am not sure about.
My data is of the following format:

  id time name           y          x1          x2
   1   10    A  1.28233854 -0.42411039  1.89640596
   1   11    A -0.59541995 -0.43214374  0.07386285
   1   12    A  0.88951720 -1.55417836  0.28276157
   2   10    B  1.11211744 -0.89200195  0.88989664
   2   11    B -0.37737953  0.09055494  1.20764357
   3   10    C  0.03258314 -0.13834344 -0.97812765
   3   11    C -0.97645525 -0.14313482 -1.03528695
   3   12    C -0.02031554  0.02061293 -0.71353867

Here is the R code to create the data:

x <- data.frame(id = rep(c(1, 2, 3), c(3,2,3)), time = c(10,11,12,10,11,10,11,12),name= rep(c("A", "B", "C"), c(3,2,3)), y = rnorm(8), x1 = rnorm(8), x2 = rnorm(8))

In order to perform the regression and the robust standard errors, I use:

library(plm)
library(sandwich)
library(lmtest)

attach(x)
# Pooling:
r1 <- plm(y ~ x1 + x2, model="pooling", x, index = c("id","time"))
r1
coeftest(r1,vcov=vcovHC(r1,type="HC0",cluster="group"))

# Fixed effects:
r2 <- plm(y ~ x1 + x2, model="within", x, index = c("id","time"))
r2
coeftest(r2,vcov=vcovHC(r2,type="HC0",cluster="group"))

detach(x)

My questions are the following:

1) Is it correct to cluster by group in the pooling model and in the fixed effects model? I could also cluster by time. My issue is that in the fixed effects model we only account for the within-variation over time, so as I understand, it wouldn't make any sense to cluster the standard errors by group under this approach.

2) There are 3 options to choose an effect, "individual", "time" or "twoways". But I could not find any good explanation which effect to use under which model. Maybe someone could tell me which effect to use in the above simple model, in either the within- or the pooling model.

Best Answer

1) Given that you have specified "id" in the regression (I guess individuals or some other unit you follow over time), the cluster="group" standard errors are clustered at the individual level. This makes sense given that a person's error today may be correlated with her error of yesterday. For more information see page 14 of these notes.

2) The default is to have individual effects in the model which would be equivalent to have a dummy for $N-1$ individuals. If you specify the twoways option, then the model will also include $T-1$ time dummies in order to estimate both individual and time fixed effects (see p. 12, Croissant and Millo (2008) "Panel Data Econometrics in R: The plm Package", link).

Related Solutions

Solved – How to estimate a fixed effects regression WITH robust standard errors AND instrument variables

EDIT: The methods are also in CRAN versions of plm >= 1.6-4.

The appropriate methods for robust vcovs were not implemented in plm. They are now in the development version >= 1.6-1 (see http://r-forge.r-project.org/R/?group_id=406).

library(plm)
library(lmtest)
data(Cigar)
fit <- plm(price ~ sales + pop, data=Cigar, index=c("state","year"), model="within")
fit2 <- plm(price ~ sales + pop | cpi + pop, data=Cigar, index=c("state","year"), model="within")

coeftest(fit, vcov.=vcovHC(fit))

# t test of coefficients:
#
#
#         Estimate Std. Error t value  Pr(>|t|)    
# sales -1.0391141  0.1671141 -6.2180 6.726e-10 ***
# pop    0.0190151  0.0064447  2.9505  0.003228 ** 
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

coeftest(fit2, vcov.=vcovHC(fit2))

# t test of coefficients:
#
#        Estimate Std. Error t value Pr(>|t|)   
# sales -6.2479556  1.9032780 -3.2827 0.001055 **
# pop   -0.0021752  0.0121773 -0.1786 0.858260   
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Solved – Clustered standard errors are completely different in R than in STATA

First, Stata uses a finite sample correction that R does not use when clustering. Second, areg is designed for datasets with many groups, but not a number that grows with the sample size. One example is states in the US. Your plm is much more like xtreg, fe. Although the point estimates produced by areg and xtreg, fe are the same, the estimated VCEs differ with clustering because the commands make different assumptions about whether the number of groups/sensors increases with the sample size. When you cluster with xtreg, fe, the asymptotics relies on the number of groups going to infinity. So this is not an apples to apples comparison.

In your setting, xtreg, fe seems more suitable since many sensors could be added. If you have to replicate areg's output, you can use felm.

Here is an econometrically stupid example demonstrating these claims. Here I am using Roger Newson's rsource to run R from within Stata, but it is not strictly necessary:

. rsource, terminator(END_OF_R)
Assumed R program path: "/usr/local/bin/R"

Beginning of R output
> suppressPackageStartupMessages({
+         require(plm)
+         require(lmtest)
+         library(foreign)
+         library(lfe)
+         data(Cigar)
+ })
> xtregfe <- plm(sales ~ price, model = 'within', data = Cigar)
> G <- length(unique(Cigar$state))
> c <- G/(G - 1)
> coeftest(xtregfe,c * vcovHC(xtregfe, type = "HC1", cluster = "group"))

t test of coefficients:

      Estimate Std. Error t value  Pr(>|t|)    
price -0.20984    0.03575 -5.8697 5.503e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

> areg<-felm(sales ~ price | state | 0 | state, Cigar)
> coeftest(areg)

t test of coefficients:

       Estimate Std. Error t value  Pr(>|t|)    
price -0.209840   0.036348 -5.7731 9.672e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

> write.dta(Cigar,"~/Desktop/Cigar.dta")
> 
End of R output

. 
. use "~/Desktop/Cigar.dta", clear
(Written by R.              )

. xtset state year
       panel variable:  state (strongly balanced)
        time variable:  year, 63 to 92
                delta:  1 unit

. xtreg sales price, fe cluster(state)

Fixed-effects (within) regression               Number of obs     =      1,380
Group variable: state                           Number of groups  =         46

R-sq:                                           Obs per group:
     within  = 0.2559                                         min =         30
     between = 0.1007                                         avg =       30.0
     overall = 0.0969                                         max =         30

                                                F(1,45)           =      34.45
corr(u_i, Xb)  = 0.0329                         Prob > F          =     0.0000

                                 (Std. Err. adjusted for 46 clusters in state)
------------------------------------------------------------------------------
             |               Robust
       sales |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       price |  -.2098402   .0357498    -5.87   0.000     -.281844   -.1378364
       _cons |   138.3669   2.456008    56.34   0.000     133.4202    143.3135
-------------+----------------------------------------------------------------
     sigma_u |  25.678164
     sigma_e |  15.174773
         rho |  .74116131   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. areg sales price, absorb(state) cluster(state)

Linear regression, absorbing indicators         Number of obs     =      1,380
Absorbed variable: state                        No. of categories =         46
                                                F(   1,     45)   =      33.33
                                                Prob > F          =     0.0000
                                                R-squared         =     0.7682
                                                Adj R-squared     =     0.7602
                                                Root MSE          =    15.1748

                                 (Std. Err. adjusted for 46 clusters in state)
------------------------------------------------------------------------------
             |               Robust
       sales |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       price |  -.2098402   .0363482    -5.77   0.000    -.2830493   -.1366312
       _cons |   138.3669   2.497119    55.41   0.000     133.3374    143.3963
------------------------------------------------------------------------------

As you can see, areg/felm give you a price coefficient of -0.20984 with a clustered standard error of 0.03635. The panel fixed effect approaches both give you -0.20984, but with a smaller CSE of 0.03575.

Best Answer

Related Solutions

Solved – How to estimate a fixed effects regression WITH robust standard errors AND instrument variables

Solved – Clustered standard errors are completely different in R than in STATA

Related Question