Solved – Clustering errors in Panel Data at the ID level and testing its necessity

clustered-standard-errorsfixed-effects-modelstata

What are the possible problems, regarding the estimation of your standard errors, when you cluster the standard errors at the ID level? And how does one test the necessity of clustered errors?

When you have panel data, with an ID for each unit repeating over time, and you run a pooled OLS in Stata, such as:

reg y x1 x2 z1 z2 i.id, cluster(id)

Or a fixed-effects model:

xtreg y x1 x2 z1 z2, fe cluster(id)

How does one test the accuracy of using clustered errors?

Best Answer

Stata provides an estimate of rho in the xtreg output. Rho is the intraclass correlation coefficient, which tells you the percent of variance in the dependent variable that is at the higher level of the data hieracrchy (here the individual). If that value is anywhere north of .01, that's a good indication that you should be concerned about clustering. As for problems, I don't know that there are any.

Related Solutions

Solved – Heteroskedasticity removed through fixed effect estimation

My question is, though, whether there is any reason to assume that the FE model's error term might not be heteroskedastic.

My interpretation is that what you really want to know is whether heteroscedasticity in the pooled OLS regression implies heteroscedasticity in the FE regression. To that the answer is no. In other words, you cannot test on the pooled OLS regression and conclude that the result also holds for the FE regression.

The model underlying the FE-estimator in its simplest form can be written as $$y_{i,t}=x_{i,t}\beta +\alpha_i + u_{i,t},$$ where we now for simplicity assume $u_{i,t}$ is iid. If you fit a model given by $$y_{i,t}=x_{i,t}\beta +e_{i,t}$$ using pooled OLS and data is generated by the fixed effects model, you have in effect set $e_{i,t} = \alpha_i + u_{i,t}$. Decompose the variance of the error term in the pooled OLS model to get: $$\operatorname{Var}(e_{i,t})=\operatorname{Cov}(\alpha_i + u_{i,t},\alpha_i + u_{i,t})=\operatorname{Var}(\alpha_i)+\operatorname{Var}(u_{i,t})+2\operatorname{Cov}(\alpha_i,u_{i,t}).$$ From this equation it is quite clear that while $u_{i,t}$ is of constant variance (it is even iid), $e_{i,t}$ can very well have non-constant variance. Therefore, evidence of an heteroscedastic error term in the pooled OLS regression is in general not evidence of an heteroscedastic error term in the fixed effects regression.

Solved – Double-clustered standard errors and large panel

Which library you are using?

With lfe (https://cran.r-project.org/web/packages/lfe/index.html), I am able to fit a model with 2000 id's and 5000 obs per id fairly easily using my laptop with 4gb memory.

library(lfe)
rep(1:5000,2000) -> id
runif(length(1:5000)*2000) -> x
y <- rnorm(length(obss))

data.frame(y, id, x) -> df

model <- felm(y ~ x | 0 | 0 | id, data=df)

EDIT Estimatr library is another fairly efficient option: http://estimatr.declaredesign.org

Best Answer

Related Solutions

Solved – Heteroskedasticity removed through fixed effect estimation

Solved – Double-clustered standard errors and large panel

Related Question