I have a panel data containing 12 different years, many id:s per year (some id:s missing in some years..unbalanced panel) and many variables. I'd like to test for auto correlation and I have run some regression models in OLS containing "year" variable to control for time. I have checked auto correlation by using lmtest::bgtest
and it suggests no auto correlation (my data is in yearly order..2002,2002,2002,2003,2003….). I learned that the plm
-package has function pbgtest
which should be the same as bgtest
but when I run the exact same OLS model in plm and test for auto correlation, the test suggests autocorrelation. Im using model = "pooling"
in my plm function, so it should be exactly the same as my OLS model created by the lm-function. I'd like to know which one of the test to use/trust and why do the results differ?
example of my model:
a <- plm(y~x,model="pooling")
b <- lm(y~x)
plm::pbgtest(a)
lmtest::bgtest(b)
Test results are not the same although data is the same and summary for a
and b
are same. The difference is not due to different order
or type= "F"
or "Chisq"
.
Best Answer
The argument
order
has different default values when you usepbgtest()
compared tobgtest()
.For
pbgtest()
order = NULL
, forbgtest()
order = 1
. See also the section Note in the documentation (?pbgtest
).What you want for a panel model the answer from
pbgtest()
(no matter what you wantorder
to be).However, you will not get the same numbers for both functions if you apply it to a panel model (fixed or random effects). This has a rather technical reason:
lmtest::bgtest
takes the data frommodel_object$model
which are the original (untransformed) data in any case. For panel models, the test needs to be run on the (quasi-)demeaned data andpbgtest()
being a wrapper aroundlmtest::bgtest()
does excatly that: extract the (quasi-)demeaned data and pass them on tolmtest::bgtest()
. For a pooling model, you will get the same numbers as the data are not transformed.Please also note that the data in models estimated by
plm
might be in a different order as plm re-orders the data to be a stacked time series. To check that, one can compare the data and its order used in estimation by looking atplm_object$model
andlm_object$model
. A different order of observations will lead to differnt results ofpbgtest
andbgtest
even for pooling models.Here is a code example how
pbgtest()
works in principle: