Solved – Breusch-Godfrey autocorrelation test: bgtest for panel data yields different results than pbgtest

autocorrelationpanel dataplm

I have a panel data containing 12 different years, many id:s per year (some id:s missing in some years..unbalanced panel) and many variables. I'd like to test for auto correlation and I have run some regression models in OLS containing "year" variable to control for time. I have checked auto correlation by using lmtest::bgtest and it suggests no auto correlation (my data is in yearly order..2002,2002,2002,2003,2003….). I learned that the plm-package has function pbgtest which should be the same as bgtest but when I run the exact same OLS model in plm and test for auto correlation, the test suggests autocorrelation. Im using model = "pooling" in my plm function, so it should be exactly the same as my OLS model created by the lm-function. I'd like to know which one of the test to use/trust and why do the results differ?

example of my model:

a <- plm(y~x,model="pooling")
b <- lm(y~x)
plm::pbgtest(a)
lmtest::bgtest(b)

Test results are not the same although data is the same and summary for a and b are same. The difference is not due to different order or type= "F" or "Chisq".

Best Answer

The argument order has different default values when you use pbgtest() compared to bgtest().

For pbgtest() order = NULL, for bgtest() order = 1. See also the section Note in the documentation (?pbgtest).

What you want for a panel model the answer from pbgtest() (no matter what you want order to be).

However, you will not get the same numbers for both functions if you apply it to a panel model (fixed or random effects). This has a rather technical reason: lmtest::bgtest takes the data from model_object$model which are the original (untransformed) data in any case. For panel models, the test needs to be run on the (quasi-)demeaned data and pbgtest() being a wrapper around lmtest::bgtest() does excatly that: extract the (quasi-)demeaned data and pass them on to lmtest::bgtest(). For a pooling model, you will get the same numbers as the data are not transformed.

Please also note that the data in models estimated by plm might be in a different order as plm re-orders the data to be a stacked time series. To check that, one can compare the data and its order used in estimation by looking at plm_object$model and lm_object$model. A different order of observations will lead to differnt results of pbgtest and bgtest even for pooling models.

Here is a code example how pbgtest() works in principle:

library(plm)
data("Grunfeld", package = "plm")
g_re <- plm(inv ~ value + capital, data = Grunfeld, model = "random")

# extract (quasi-)demeanded data:
X <- model.matrix(g_re)
y <- pmodel.response(g_re)
# make a lm model object to be passed on to lmtest::bgtest()
lm.mod <- lm(y ~ X - 1)

# same coefficients:
all.equal(lm.mod$coefficients, g_re$coefficients, check.attributes = FALSE)
lmtest::bgtest(lm.mod, order = 1)
plm::pbgtest(g_re, order = 1) # identical results
Related Question