Solved – Estimating robust standard errors in panel data regressions

panel dataplmrrobust-standard-errorstandard error

I am trying to estimate robust standard errors in a panel data regression. I understand panel data regressions conceptually, but R offers a lot of options I am not sure about.
My data is of the following format:

  id time name           y          x1          x2
   1   10    A  1.28233854 -0.42411039  1.89640596
   1   11    A -0.59541995 -0.43214374  0.07386285
   1   12    A  0.88951720 -1.55417836  0.28276157
   2   10    B  1.11211744 -0.89200195  0.88989664
   2   11    B -0.37737953  0.09055494  1.20764357
   3   10    C  0.03258314 -0.13834344 -0.97812765
   3   11    C -0.97645525 -0.14313482 -1.03528695
   3   12    C -0.02031554  0.02061293 -0.71353867    

Here is the R code to create the data:

x <- data.frame(id = rep(c(1, 2, 3), c(3,2,3)), time = c(10,11,12,10,11,10,11,12),name= rep(c("A", "B", "C"), c(3,2,3)), y = rnorm(8), x1 = rnorm(8), x2 = rnorm(8))

In order to perform the regression and the robust standard errors, I use:

library(plm)
library(sandwich)
library(lmtest)

attach(x)
# Pooling:
r1 <- plm(y ~ x1 + x2, model="pooling", x, index = c("id","time"))
r1
coeftest(r1,vcov=vcovHC(r1,type="HC0",cluster="group"))

# Fixed effects:
r2 <- plm(y ~ x1 + x2, model="within", x, index = c("id","time"))
r2
coeftest(r2,vcov=vcovHC(r2,type="HC0",cluster="group"))

detach(x)

My questions are the following:

1) Is it correct to cluster by group in the pooling model and in the fixed effects model? I could also cluster by time. My issue is that in the fixed effects model we only account for the within-variation over time, so as I understand, it wouldn't make any sense to cluster the standard errors by group under this approach.

2) There are 3 options to choose an effect, "individual", "time" or "twoways". But I could not find any good explanation which effect to use under which model. Maybe someone could tell me which effect to use in the above simple model, in either the within- or the pooling model.

Best Answer

1) Given that you have specified "id" in the regression (I guess individuals or some other unit you follow over time), the cluster="group" standard errors are clustered at the individual level. This makes sense given that a person's error today may be correlated with her error of yesterday. For more information see page 14 of these notes.

2) The default is to have individual effects in the model which would be equivalent to have a dummy for $N-1$ individuals. If you specify the twoways option, then the model will also include $T-1$ time dummies in order to estimate both individual and time fixed effects (see p. 12, Croissant and Millo (2008) "Panel Data Econometrics in R: The plm Package", link).