Solved – Serial correlation: estimation vs robust SE

econometricshausmanpanel datar

We have route-level data (that I cannot share) on monthly bus ridership in New York City, creating a panel $N= 185$, $T=36$. We estimate a fixed effects model and random effects model with R's plm package. (For this MWE, I use the Grunfeld investment data, which illustrates the problems I am seeing fairly well).

library(plm)
data("Grunfeld")
model_1A <- lm(inv ~ value + capital, data= Grunfeld)
fe.plm <- plm(formula(model_1A), model="within", index=c("firm", "year"),
              data=Grunfeld)
re.plm <- plm(formula(model_1A), model="random", index=c("firm", "year"),
              data=Grunfeld)

A Hausman test indicates that the FE model is preferred, because the estimates differ (note that in the MWE, we fail to reject).

phtest(fe.plm, re.plm)

##  Hausman Test

##data:  formula(model_1A)
##chisq = 2.3304, df = 2, p-value = 0.3119
##alternative hypothesis: one model is inconsistent

pbgtest(fe.plm)

##  Breusch-Godfrey/Wooldridge test for serial correlation in panel models

##data:  formula(model_1A)
##chisq = 65.0632, df = 20, p-value = 1.14e-06
##alternative hypothesis: serial correlation in idiosyncratic errors

pbgtest(re.plm)

##  Breusch-Godfrey/Wooldridge test for serial correlation in panel models

##data:  formula(model_1A)
##chisq = 69.9495, df = 20, p-value = 1.856e-07
##alternative hypothesis: serial correlation in idiosyncratic errors

This is where what I think should happen diverges from what many people try to do. My understanding of serial correlation is that it affects the standard errors but not the coefficients. This would suggest to me a serial correlation-robust standard error. For instance (as in this answer),

fe.rse <- sqrt(diag(vcovHC(fe.plm, type="HC1", cluster="group")))
re.rse <- sqrt(diag(vcovHC(re.plm, type="HC1", cluster="group")))

** Why not just use sc-robust standard errors?**

But what many authors do instead is include specific AR(1) or ARMA disturbances because Stata makes this easy. For the FE models, we can use gls from the nlme package on demeaned data (note: fe.plm and fe.gls are virtually identical),

# within estimator is demeaned
demean <- numcolwise(function(x) x - mean(x))
Grunfeld.dm <- ddply(Grunfeld, .(firm), demean)
Grunfeld.dm$year <- Grunfeld$year

fe.gls <- gls(update(formula(model_1A), .~.-1), method="ML",  data=Grunfeld.dm)
fear.gls <- update(fe.gls,  correlation = corAR1(form = ~ year | firm))
fearma.gls <- update(fe.gls, correlation = corARMA(form = ~ year | firm, 
                                                   p=1,q=1))

The RE models can be estimated in with lme in the nlme package (again, re.plm and re.lme are identical).

re.lme <- lme(fixed = formula(model_1A), random = ~ 1|firm, data = mta)
rear.lme <- update(re.lme,  correlation = corAR1(form = ~ year | firm))
rearma.lme <- update(re.lme,  correlation = corARMA(form = ~ year | firm, 
                                                    p=1,q=1))

There are a few things I don't understand about this:

Why do the coefficients change when serial correlation doesn't (shouldn't?) affect estimates?
Can we still use the Hausman test to select between FE and RE models with autoregressive errors?
How can we test for residual autocorrelation? And if it exists, wouldn't we still need a robust standard error?

Best Answer

I think I can help with some of your questions.

1) Why not just use serial correlation robust standard errors? Clustered standard errors will be more robust. For example, if you have serial correlation and heteroskedasticity, clustered standard errors would be valid here, while serial correlation robust standard errors would not be.

2) Why do the coefficients change (I think you mean between FE and adding some explicit AR process)? I think that what is happening is that two procedures that are both consistent may lead different numeric results in finite samples (Fixed Effects vs. Random Effects when random effects assumptions are valid is a good example of when this would be the case). Also, misspecification of the the serial correlation could be a problem for consistency (not 100% sure here).

3) Can we use the Hausman test with serial correlation? Not off-the-shelf. The test statistic depends on the variance matrix of $\beta_{RE}$ and $\beta_{FE}$. You would need to adjust these (using e.g. clustered standard errors) to use the Hausman test.

In my opinion, you should do fixed effects estimation and adjust the standard errors as you mention in your post.

Related Solutions

Solved – Unbalanced Panel: pooled OLS vs FE vs RE – which method yields unbiased and robust estimators

"Clustering by firms" doesn't exclude OLS as a possibility. One could simply adjust for a dummy variable indicating the firm and objectively call that a "cluster". More commonly, "clustering by firm" means adding a random intercept term for firms. This is the preferred approach when the number of firms is large relatively to the sample size. Adding a random intercept makes this type of model a mixed effects model. Pooled OLS will estimate a random intercept and a random slope, thus is a more general model. However, the estimates can be very unstable when the number of observations-per-firm is small.
Time can be handled using fixed effects as a dummy variable. It's better as a continuous variable. Splines interpolate dummy variables without requiring that all (or even more than 1) firm measure outcomes at exactly the same time. This can save you from binning or matching times and improves analysis significantly. You can still add a dummy variable for season if there are cyclic effects relating to time-of-year.
Without a prespecified hypothesis about the impact of omitted variables, variance structures, or other things, the Hausman and Breush Pagan test make no sense in isolation. Diagnostic tests are prone to reject too often because they are simply over powered by moderate-to-large samples. It is better to use diagnostic plots like a variogram.
One way to check pooled OLS vs fixed effects is to do a likelihood ratio test. They are both fully ML procedures. The numerator degrees of freedom for the pooled OLS would be $n_c * 2 + p$ where $p$ is the number of endogenous parameters (like firm type, season) and $n_c$ number of firms, 2 is the slope and intercept terms within each subOLS though they may be different.

Solved – Breusch-Godfrey autocorrelation test: bgtest for panel data yields different results than pbgtest

The argument order has different default values when you use pbgtest() compared to bgtest().

For pbgtest() order = NULL, for bgtest() order = 1. See also the section Note in the documentation (?pbgtest).

What you want for a panel model the answer from pbgtest() (no matter what you want order to be).

However, you will not get the same numbers for both functions if you apply it to a panel model (fixed or random effects). This has a rather technical reason: lmtest::bgtest takes the data from model_object$model which are the original (untransformed) data in any case. For panel models, the test needs to be run on the (quasi-)demeaned data and pbgtest() being a wrapper around lmtest::bgtest() does excatly that: extract the (quasi-)demeaned data and pass them on to lmtest::bgtest(). For a pooling model, you will get the same numbers as the data are not transformed.

Please also note that the data in models estimated by plm might be in a different order as plm re-orders the data to be a stacked time series. To check that, one can compare the data and its order used in estimation by looking at plm_object$model and lm_object$model. A different order of observations will lead to differnt results of pbgtest and bgtest even for pooling models.

Here is a code example how pbgtest() works in principle:

library(plm)
data("Grunfeld", package = "plm")
g_re <- plm(inv ~ value + capital, data = Grunfeld, model = "random")

# extract (quasi-)demeanded data:
X <- model.matrix(g_re)
y <- pmodel.response(g_re)
# make a lm model object to be passed on to lmtest::bgtest()
lm.mod <- lm(y ~ X - 1)

# same coefficients:
all.equal(lm.mod$coefficients, g_re$coefficients, check.attributes = FALSE)
lmtest::bgtest(lm.mod, order = 1)
plm::pbgtest(g_re, order = 1) # identical results

Best Answer

Related Solutions

Solved – Unbalanced Panel: pooled OLS vs FE vs RE – which method yields unbiased and robust estimators

Solved – Breusch-Godfrey autocorrelation test: bgtest for panel data yields different results than pbgtest

Related Question