Solved – Standard error clustering under treatment assignment in groups of varying size

cluster-sampleclustered-standard-errorsstandard errorstatatreatment-effect

Basic setup:

Unit of observation is the individual.
Treatment (binary) is assigned on city level.
Every state contains 4 cities, 2 get randomly chosen for treatment, 2 control. There are only few (e.g. 5+) states (strata). The outcome of interest is likely to be regionally clustered. I only observe one wave of outcomes, not a panel. (Side remark: For other reasons it is desirable to use state fixed effects.)

Question:

How to cluster standard errors for treatment effect inference?

Cameron and Miller (2014) state that

[If] either the regressors or the errors are likely to be uncorrelated within a potential group, then there is no need to cluster within that group […] If a key regressor is randomly assigned within clusters […] then the within-cluster correlation of the regressor is likely to be zero. Thus there is
no need to cluster standard errors, even if the model’s errors are clustered.

Following this logic, it would not be necessary to cluster at the state level, as city-treatment is random within state. However, varying city sizes introduce within state correlation of treatment. Yet, the small number of states makes it less attractive to cluster at the state level.

I think, because the exact character of the within cluster correlation of treatment is known (city size), there must be a more efficient way to correct for this, i.e. to cluster at the city level and cope in some other way with the ex post within-state correlation of treatment.

Reference:

A. Colin Cameron and Douglas L. Miller (2014), A Practitioner’s Guide to Cluster-Robust Inference, Journal of Human Resources: http://www.econ.ucdavis.edu/faculty/cameron/research/Cameron_Miller_JHR_2014_July_09.pdf

Best Answer

Your quote from Cameron and Miller (2014) is right though I guess that you have a panel for those cities, meaning that you observe them before and after the treatment. In that case the time component may introduce a clustering problem as cities from the same state are subject to the same within state shocks. For that reason it would still make sense to cluster at the state level.

What you actually ask is a question at the frontier of econometric research on cluster-robust inference with few clusters. The typical reflex nowadays is to immediately hint at the paper by Cameron et al. (2008) and their wild cluster bootstrap percentile-t statistic. Even though their method greatly improves on the generic cluster robust variance estimator that you commonly find in statistical packages, your number of cluster is even too small for their procedure. A recent paper by Webb (2014) provides simulation evidence on the wild bootstrap percentile-t not producing point identified p-values when you have less than 11 clusters.

Matthew also develops his own variance estimator by improving on the current wild bootstrap one. He shows that his method works with as little as 5 clusters - which seems to be what you have given the number of states. Using his method might work for you especially since you have equally sized clusters (wildly different cluster sizes are a problem even when you have a much larger number of clusters; see Webb's 2015 paper with James MacKinnon if you are interested).

Related Question