Solved – Example of multi-way clustering robust standard errors: puzzling results

clustered-standard-errorsrobust-standard-error

I am trying to get a grasp on Cameron, Gelbach and Miller (2011) robust inference with multiway clustering. As I understand, bottom line is that ignoring clustering may result in standard errors severely underestimated, therefore, increasing considerably type-I error. They argue that applied researchers find this kind of situations very frequently and very (for example, children clustered in schools and all the data clustered in time).

So I decided to run some examples to understand the issues better and to get a sense on how big the problem would be. I am using R and the package multiwayvcov (http://cran.r-project.org/web/packages/multiwayvcov/index.html) that implements the methods proposed by Cameron et al. (2011). Here's my code:

library(multiwayvcov) # multi-way clustering Cameron, Gelbach, Miller, 2011
# Some fake data with two clusters and one covariate x
df <- data.frame(cluster1 = sample(1:4, 1000, replace = TRUE), 
                 cluster2 = sample(1:4, 1000, replace = TRUE), 
                 x = 5*rnorm(1000))
# Dependent variable as a function of the covariate (x) and both clusters
df$y <- with(df, x + 0.5*cluster1 + 0.3*cluster2 + rnorm(1000))
# Linear model ignoring clustering
lm1 <- lm(y ~ x, data = df)
# Linear model including the clusters as regressors
lm2 <- lm(y ~ x + cluster1 + cluster2, data = df)
# Extract standard errors from both models, using the default standard errors
# and the multi-way cluster robust standard errors
# For lm1
sqrt(diag(vcov(lm1)))
sqrt(diag(cluster.vcov(lm1, df[ , c("cluster1", "cluster2")])))
# For lm2
sqrt(diag(vcov(lm2)))
sqrt(diag(cluster.vcov(lm2, df[ , c("cluster1", "cluster2")])))

And here the result:

> sqrt(diag(vcov(lm1)))
(Intercept)           x 
0.037401718 0.007624887 
> sqrt(diag(cluster.vcov(lm1, df[ , c("cluster1", "cluster2")])))
(Intercept)           x 
0.388342350 0.005365563 
> # For lm2
> sqrt(diag(vcov(lm2)))
(Intercept)           x    cluster1    cluster2 
0.102379673 0.006331239 0.028197782 0.027870294 
> sqrt(diag(cluster.vcov(lm2, df[ , c("cluster1", "cluster2")])))
(Intercept)           x    cluster1    cluster2 
 0.09875941  0.00439916  0.03468323  0.03039352 
> 

I am having trouble understanding these results; in particular, I am looking at the standard error of the covariate x: in both models it is smaller using the robust standard errors, and I expected exactly the opposite given that all the argument of the paper is that standard errors would be underestimated, had I ignored clustering. So, why these results show larger standard errors by ignoring clustering in the data?

Best Answer

I am not expert in this, but I see at least two issues you should consider:

1) The authors show in their paper that "The impact [of multi-way clustering] is likely to be the greatest when both the regressor and the error are correlated over two dimensions". In your example the regressor $x$ is exogenous. Maybe you should try making x correlated with cluster1 and cluster2.

2) Also, the authors state that their "methods assume that the numbers of clusters goes to infinity". Your example is clearly far from that!, you are simulating only 4 clusters. Maybe you should also try to run your example with more clusters (>50 maybe).

Finally, I tried to use your code but many times NaN are produced. I guess this happens because of the limited number of clusters, but I did not look into it. Be aware that it is good practice to set the seed of your pseudo-random numbers generator when you are using simulations like you did, so other people can replicate your results. Since I couldn't replicate your results, I got bored and stopped looking into your question. In R, you only have to to run set.seed(ANY NUMBER YOU FANCY) beginning your code.