Solved – How to cope with serial correlation and time effects in a panel data model in R

econometricspanel dataplmr

I am building a panel-data model for macroeconomic analysis and I am currently stalling on how to deal with some problems. Diagnostic tests indicate that there is cross-sectional dependence (which is logical), serial correlation and that i should use time-fixed effects. (And probably there is much more…)

Here is a link to my data: Panel Data

Here is the code (abbreviated for better usage):

#--------------load libraries
options(java.parameters = "-Xmx1024m")
library(XLConnect)   
library(plm)

#--------------read data
wb <- loadWorkbook("Panel.xlsx") 
df <- readWorksheet(wb, sheet="Sheet1")

#--------------Fixed effects paneldata model
fixed <- plm(Price ~ Income + Housing_units + Population_age + Population_density + Unemployment + Real_mortgage_rate + Expected_GDP_growth,
             data=df, drop.unused.levels = TRUE, index=c("Id", "Year"), model="within")
summary(fixed)

#--------------Random effects paneldata model
random <- plm(Price ~ Income + Housing_units + Population_age + Population_density + Unemployment + Real_mortgage_rate + Expected_GDP_growth,
              data=df, drop.unused.levels = TRUE, index=c("Id", "Year"), model="random")
summary(random)

#--------------Diagnostics
phtest(fixed, random) # Fixed-effects model better than random-effects model
plmtest(fixed, c("time"), type=("bp")) # Need of time-fixed effects
pcdtest(fixed, test = c("lm")) # Test indicates cross-sectional dependance
pcdtest(fixed, test = c("cd")) # Test indicates cross-sectional dependance
pbgtest(fixed) # Test indicates serial correlation

Do you have any ideas for improvement or next steps? Or to be more precise: How can I implement time-fixed effects? And how can I deal with serial correlation?

  1. Edit:

Here is a sample of the data:

                                      Id Year     Price   Income Housing_units Population_age Population_density Unemployment Real_mortgage_rate Expected_GDP_growth
1                Arrondissement d’Anvers 2001 106447.82 18826.26     0.2844412      0.3608039       0.0010737967   0.05793009         0.04277500               0.021
2                Arrondissement d’Anvers 2002 118716.84 18679.44     0.2838416      0.3593445       0.0010694183   0.07099468         0.03880833               0.021
3                Arrondissement d’Anvers 2003 123591.79 18503.93     0.2829195      0.3583308       0.0010633930   0.07683871         0.03296667               0.015
4                Arrondissement d’Anvers 2004 127741.49 18489.21     0.2820850      0.3568954       0.0010586214   0.07371481         0.03050833               0.017
5                Arrondissement d’Anvers 2005 175851.72 18446.58     0.2814552      0.3548318       0.0010542290   0.07328001         0.01900000               0.014
6                Arrondissement d’Anvers 2006 194872.80 18863.60     0.2803233      0.3524605       0.0010477999   0.06778650         0.01948333               0.018
7                Arrondissement d’Anvers 2007 208739.77 19226.49     0.2786186      0.3504475       0.0010407672   0.06076701         0.02731667               0.018
8                Arrondissement d’Anvers 2008 210862.56 19225.50     0.2766267      0.3484786       0.0010317159   0.05498332         0.01218333               0.010
9                Arrondissement d’Anvers 2009 212451.01 19482.80     0.2748636      0.3466580       0.0010232215   0.07004942         0.03553333               0.013
10               Arrondissement d’Anvers 2010 221431.63 18989.30     0.2727264      0.3451426       0.0010152046   0.07529093         0.02287500               0.018
11               Arrondissement d’Anvers 2011 219939.30 18277.79     0.2693328      0.3440205       0.0010021064   0.06460002         0.01052500               0.015
12               Arrondissement d’Anvers 2012 218487.50 17916.28     0.2664027      0.3424343       0.0009904085   0.06914538         0.01032500               0.010
13             Arrondissement de Malines 2001 100110.58 19524.44     0.3509642      0.3715508       0.0016651232   0.03690743         0.04277500               0.021
14             Arrondissement de Malines 2002 108968.27 19334.37     0.3514912      0.3711313       0.0016595987   0.04664770         0.03880833               0.021
15             Arrondissement de Malines 2003 115356.87 19336.42     0.3516660      0.3701327       0.0016529746   0.05534163         0.03296667               0.015
16             Arrondissement de Malines 2004 119105.32 19200.12     0.3515692      0.3689736       0.0016465732   0.05250104         0.03050833               0.017
17             Arrondissement de Malines 2005 154714.71 19360.47     0.3510717      0.3672112       0.0016380779   0.05320718         0.01900000               0.014
18             Arrondissement de Malines 2006 172113.51 20007.19     0.3496116      0.3647687       0.0016236333   0.04852456         0.01948333               0.018
19             Arrondissement de Malines 2007 186916.68 20447.05     0.3496034      0.3628630       0.0016134620   0.04139262         0.02731667               0.018
20             Arrondissement de Malines 2008 190700.73 20521.17     0.3483314      0.3602553       0.0015988850   0.03687283         0.01218333               0.010

Here are some results of the diagnostics:

    Lagrange Multiplier Test - time effects (Breusch-Pagan)

data:  Price ~ Income + Housing_units + Population_age + Population_density +  ...
chisq = 1522.558, df = 1, p-value < 2.2e-16
alternative hypothesis: significant effects

    Breusch-Godfrey/Wooldridge test for serial correlation in panel models

data:  Price ~ Income + Housing_units + Population_age + Population_density +     Unemployment + Real_mortgage_rate + Expected_GDP_growth
chisq = 233.2558, df = 12, p-value < 2.2e-16
alternative hypothesis: serial correlation in idiosyncratic errors

Best Answer

Your question is not very clear, and the link to the data is no longer working...

For the time fixed effects, your call should look like this:

fixed <- plm(Price ~ Income + Housing_units + Population_age + 
   Population_density + Unemployment + Real_mortgage_rate + Expected_GDP_growth,
   data=df, index=c("Id", "Year"), model="within", effect="time")

If you want both individual and time FEs you can also use effect="twoways".

To deal with serial correlation you can use vcovHC.plm(), which by default computes SEs clustered by group, i.e. robust wrt heteroscedasticity and arbitrary correlations within the clusters. See Chapter 14.4 of Using R for Introductory Econometrics (Heiss, 2016). (You can also read it online.) To obtain robust SEs is easy:

require(lmtest)
coeftest(fixed, vcov. = vcovHC)

All of this is discussed in the plm vignette:

http://cran.at.r-project.org/web/packages/plm/vignettes/plm.pdf

Related Question