Solved – Double-clustered standard errors and large panel

clustered-standard-errorslarge datapanel datarobust-standard-error

I have a large panel data set featuring the purchases of 5000+ individuals over 2000+ time periods (days). I am looking to estimate pooled OLS regressions featuring double-clustered standard errors (where standard errors are clustered by both individual and time) but the dimensions of this problem are causing issues.

If i cluster standard errors by one dimension only (either individuals or time) I can quickly obtain parameter estimates (I am using both Stata and R). However, if I try to double-cluster my standard errors along both dimensions then the code takes hours to run and does not produce output.

Are there any ways to estimate these regressions given the large number of clusters? Some ad-hoc thoughts I have had are to split the panel into weekly or monthly sub-panels in order to reduce the dimensions of the problem but I am sure this is not sound.

Suggestions appreciated!

Best Answer

Which library you are using?

With lfe (https://cran.r-project.org/web/packages/lfe/index.html), I am able to fit a model with 2000 id's and 5000 obs per id fairly easily using my laptop with 4gb memory.

library(lfe)
rep(1:5000,2000) -> id
runif(length(1:5000)*2000) -> x
y <- rnorm(length(obss))

data.frame(y, id, x) -> df

model <- felm(y ~ x | 0 | 0 | id, data=df)

EDIT Estimatr library is another fairly efficient option: http://estimatr.declaredesign.org