Solved – Serial correlation: estimation vs robust SE

econometricshausmanpanel datar

We have route-level data (that I cannot share) on monthly bus ridership in New York City, creating a panel $N= 185$, $T=36$. We estimate a fixed effects model and random effects model with R's plm package. (For this MWE, I use the Grunfeld investment data, which illustrates the problems I am seeing fairly well).

library(plm)
data("Grunfeld")
model_1A <- lm(inv ~ value + capital, data= Grunfeld)
fe.plm <- plm(formula(model_1A), model="within", index=c("firm", "year"),
              data=Grunfeld)
re.plm <- plm(formula(model_1A), model="random", index=c("firm", "year"),
              data=Grunfeld)

A Hausman test indicates that the FE model is preferred, because the estimates differ (note that in the MWE, we fail to reject).

phtest(fe.plm, re.plm)

##  Hausman Test

##data:  formula(model_1A)
##chisq = 2.3304, df = 2, p-value = 0.3119
##alternative hypothesis: one model is inconsistent

pbgtest(fe.plm)

##  Breusch-Godfrey/Wooldridge test for serial correlation in panel models

##data:  formula(model_1A)
##chisq = 65.0632, df = 20, p-value = 1.14e-06
##alternative hypothesis: serial correlation in idiosyncratic errors

pbgtest(re.plm)

##  Breusch-Godfrey/Wooldridge test for serial correlation in panel models

##data:  formula(model_1A)
##chisq = 69.9495, df = 20, p-value = 1.856e-07
##alternative hypothesis: serial correlation in idiosyncratic errors

This is where what I think should happen diverges from what many people try to do. My understanding of serial correlation is that it affects the standard errors but not the coefficients. This would suggest to me a serial correlation-robust standard error. For instance (as in this answer),

fe.rse <- sqrt(diag(vcovHC(fe.plm, type="HC1", cluster="group")))
re.rse <- sqrt(diag(vcovHC(re.plm, type="HC1", cluster="group")))

** Why not just use sc-robust standard errors?**

But what many authors do instead is include specific AR(1) or ARMA disturbances because Stata makes this easy. For the FE models, we can use gls from the nlme package on demeaned data (note: fe.plm and fe.gls are virtually identical),

# within estimator is demeaned
demean <- numcolwise(function(x) x - mean(x))
Grunfeld.dm <- ddply(Grunfeld, .(firm), demean)
Grunfeld.dm$year <- Grunfeld$year

fe.gls <- gls(update(formula(model_1A), .~.-1), method="ML",  data=Grunfeld.dm)
fear.gls <- update(fe.gls,  correlation = corAR1(form = ~ year | firm))
fearma.gls <- update(fe.gls, correlation = corARMA(form = ~ year | firm, 
                                                   p=1,q=1))

The RE models can be estimated in with lme in the nlme package (again, re.plm and re.lme are identical).

re.lme <- lme(fixed = formula(model_1A), random = ~ 1|firm, data = mta)
rear.lme <- update(re.lme,  correlation = corAR1(form = ~ year | firm))
rearma.lme <- update(re.lme,  correlation = corARMA(form = ~ year | firm, 
                                                    p=1,q=1))

There are a few things I don't understand about this:

  • Why do the coefficients change when serial correlation doesn't (shouldn't?) affect estimates?

  • Can we still use the Hausman test to select between FE and RE models with autoregressive errors?

  • How can we test for residual autocorrelation? And if it exists, wouldn't we still need a robust standard error?

Best Answer

I think I can help with some of your questions.

1) Why not just use serial correlation robust standard errors? Clustered standard errors will be more robust. For example, if you have serial correlation and heteroskedasticity, clustered standard errors would be valid here, while serial correlation robust standard errors would not be.

2) Why do the coefficients change (I think you mean between FE and adding some explicit AR process)? I think that what is happening is that two procedures that are both consistent may lead different numeric results in finite samples (Fixed Effects vs. Random Effects when random effects assumptions are valid is a good example of when this would be the case). Also, misspecification of the the serial correlation could be a problem for consistency (not 100% sure here).

3) Can we use the Hausman test with serial correlation? Not off-the-shelf. The test statistic depends on the variance matrix of $\beta_{RE}$ and $\beta_{FE}$. You would need to adjust these (using e.g. clustered standard errors) to use the Hausman test.

In my opinion, you should do fixed effects estimation and adjust the standard errors as you mention in your post.

Related Question