Solved – Breusch-Godfrey autocorrelation test: bgtest for panel data yields different results than pbgtest

autocorrelationpanel dataplm

I have a panel data containing 12 different years, many id:s per year (some id:s missing in some years..unbalanced panel) and many variables. I'd like to test for auto correlation and I have run some regression models in OLS containing "year" variable to control for time. I have checked auto correlation by using lmtest::bgtest and it suggests no auto correlation (my data is in yearly order..2002,2002,2002,2003,2003….). I learned that the plm-package has function pbgtest which should be the same as bgtest but when I run the exact same OLS model in plm and test for auto correlation, the test suggests autocorrelation. Im using model = "pooling" in my plm function, so it should be exactly the same as my OLS model created by the lm-function. I'd like to know which one of the test to use/trust and why do the results differ?

example of my model:

a <- plm(y~x,model="pooling")
b <- lm(y~x)
plm::pbgtest(a)
lmtest::bgtest(b)

Test results are not the same although data is the same and summary for a and b are same. The difference is not due to different order or type= "F" or "Chisq".

Best Answer

The argument order has different default values when you use pbgtest() compared to bgtest().

For pbgtest() order = NULL, for bgtest() order = 1. See also the section Note in the documentation (?pbgtest).

What you want for a panel model the answer from pbgtest() (no matter what you want order to be).

However, you will not get the same numbers for both functions if you apply it to a panel model (fixed or random effects). This has a rather technical reason: lmtest::bgtest takes the data from model_object$model which are the original (untransformed) data in any case. For panel models, the test needs to be run on the (quasi-)demeaned data and pbgtest() being a wrapper around lmtest::bgtest() does excatly that: extract the (quasi-)demeaned data and pass them on to lmtest::bgtest(). For a pooling model, you will get the same numbers as the data are not transformed.

Please also note that the data in models estimated by plm might be in a different order as plm re-orders the data to be a stacked time series. To check that, one can compare the data and its order used in estimation by looking at plm_object$model and lm_object$model. A different order of observations will lead to differnt results of pbgtest and bgtest even for pooling models.

Here is a code example how pbgtest() works in principle:

library(plm)
data("Grunfeld", package = "plm")
g_re <- plm(inv ~ value + capital, data = Grunfeld, model = "random")

# extract (quasi-)demeanded data:
X <- model.matrix(g_re)
y <- pmodel.response(g_re)
# make a lm model object to be passed on to lmtest::bgtest()
lm.mod <- lm(y ~ X - 1)

# same coefficients:
all.equal(lm.mod$coefficients, g_re$coefficients, check.attributes = FALSE)
lmtest::bgtest(lm.mod, order = 1)
plm::pbgtest(g_re, order = 1) # identical results

Related Solutions

Solved – Test Excluded Instrument in R PLM package

If you only care about the correct estimation of standard errors, and you don't have any other particular reason to use the plm package, you can use the lfe package. Still, you can use the data created with the plm package

library(lfe)
form_iv <- formula(y~z | id + year | x ~ inst)
my_iv_reg <- felm(form_iv, data = p.lmdb)
wald_iv <- waldtest(my_iv_reg$stage1, ~deg_alu, lhs=my_iv_reg$stage1$lhs)

The value of the F-test is the fifth value of the waldtest function, i.e.

wald_iv[5]

While the significance of the F-test is the fourth value of the waldtest function, i.e.,

wald_iv[4]

Note that if you have more than one instrument, e.g. inst_1 and inst_2, and you need to estimate their joint significance, you can run:

form_iv <- formula(y~z | id + year | x ~ inst_1 + inst_2)
my_iv_reg <- felm(form_iv, data = p.lmdb)
wald_iv <- t(sapply(my_iv_reg$stage1$lhs, function(lh){ 
                    waldtest(my_iv_reg$stage1, ~inst_1|inst_2, lhs=lh)} ))

Solved – Using PLM in R for Panel Data

I would suggest using a logit panel model instead, since that would constrain your outcome variable to fall between 0 and 1. For this, you need to use the function pglm(), which will require you installing the pglm package first. Off the top of my head, I think you need to specify the option, family = "binomial", in order to get a logit model, but double check this in the documentation. Let me know if you have any more questions.

Best Answer

Related Solutions

Solved – Test Excluded Instrument in R PLM package

Solved – Using PLM in R for Panel Data

Related Question