Solved – Panel data: difference between time effects and cross-sectional dependence

fixed-effects-modelpanel data

I am currently learning (by myself) about the analysis of panel data.

What I have seen so far is that a fixed effects model allows us to control for idiosyncratic differences between entities. Moreover, dependencies that universally affect all entities can be controlled by adding time effects.

Here is my question: Are cross sectional dependencies completely controlled when I add time effects to the model ? Or do I need to account for cross sectional dependencies even after I have fixed time effects ?

(Background to the question: One of my study guides explains that time effects are those phenomena that affect all entities over time, such as federal laws in a study local state policies. But then another study guide states that this as an example of a cross sectional dependence which has to be checked even if time effects have been added to the model! That leaves me wondering about the difference of cross sectional dependencies and time effects)

Many thanks for your help!

Best Answer

I'm just thinking out loud here,

Suppose you have industry-county-year level data, your outcome is $Y_{ict}$, and you are interested in the effect of some variable $x_{ict}$.

In your strategy you would correctly think you can use:

(1) industry-county (panel) fixed effects to control for time invariant confounding factors across these panels as well as the average difference in time varying covariates across industry-county pairs

(2) year fixed effects to control for shocks that are common to all industries and counties in a given year

However what if there are shocks that are common across some counties in regions indexed by $r$, yet are both time varying and different across regions?

That is, perhaps the true data generating process is

$Y_{ict}=\underbrace{\theta_{ic}}_\text{panel fixed effect}+\underbrace{\theta_t}_\text{year fixed effect}+\underbrace{\theta_{rt}}_\text{regional shocks}+\underbrace{\beta}_\text{parameter of interest} X_{ict}+\underbrace{\epsilon_{ict}}_\text{idiosyncratic shock}$

But you estimate a model

$Y_{ict}=\theta_{ic}+\theta_t+\beta X_{ict}+\epsilon_{ict}$

which does not attempt to proxy for this regional shock, then,to to the degree that $Cov(\theta_{rt},X_{ict})\neq 0$, I believe your estimate $\hat{\beta}$ would in part reflect the variation in $\theta_{rt}$ that covaries with $X_{ict}$.

That is,

$plim \; \hat{\beta} =\underbrace{ \beta}_\text{true parameter} + \underbrace{\frac{Cov(X_{ict},\theta_{rt})}{Var(X_{ict}}}_\text{bias}$

to solve this I believe it is possible that you could

(1) Cluster your standard errors at the geographical level where you think there may be correlated disturbances

and

2) Find an instrument $Z_{ict}$ for $X_{ict}$ that is strongly correlated with $X_{ict}$ (relevant) that has an effect on the outcome only through its effect on $X_{ict}$ and not through $\theta_{rt}$ influencing $Z_{ict}$ or through $Z_{ict}$ influencing $Y_{ict}$ directly (excludibility).

Related Solutions

Panel Data – Validity of Pseudo-Panel Data from Repeated Cross-Sectional Data

I do not know whether there are established methods to compare panel data to repeated cross-sectional data. But I want to add that true panel data is not always superior to repeated cross-sectional data in general. Attrition or learning effects for example may be a problem in panel data but not in repeated cross-sectional data although I do not know whether these problems are present in your case. But if this is the case, the second and third years (and so on) of your panel data may be problematic compared to repeated cross-sectional data in some sense. You should keep this in mind.

In general I think what you want to do sounds doable and it could reveal new information in comparison with the analysis of cross-sectional data only (although I do not know your research question).

If the estimations differ between both analyses I would have a look whether what could be the reasons by looking at the advantages and disadvantes of both types of datasets. There are several papers about the this topic which might help you such as

Deaton (1985)

Verbeek & Nijman (1992)

Frees (2004)

Lee & Niemeier (1996)

Hsiao (2007)

Solved – How to cope with serial correlation and time effects in a panel data model in R

Your question is not very clear, and the link to the data is no longer working...

For the time fixed effects, your call should look like this:

fixed <- plm(Price ~ Income + Housing_units + Population_age + 
   Population_density + Unemployment + Real_mortgage_rate + Expected_GDP_growth,
   data=df, index=c("Id", "Year"), model="within", effect="time")

If you want both individual and time FEs you can also use effect="twoways".

To deal with serial correlation you can use vcovHC.plm(), which by default computes SEs clustered by group, i.e. robust wrt heteroscedasticity and arbitrary correlations within the clusters. See Chapter 14.4 of Using R for Introductory Econometrics (Heiss, 2016). (You can also read it online.) To obtain robust SEs is easy:

require(lmtest)
coeftest(fixed, vcov. = vcovHC)

All of this is discussed in the plm vignette:

http://cran.at.r-project.org/web/packages/plm/vignettes/plm.pdf

Best Answer

Related Solutions

Panel Data – Validity of Pseudo-Panel Data from Repeated Cross-Sectional Data

Solved – How to cope with serial correlation and time effects in a panel data model in R

Related Question