When running a linear regression, standard errors may need to be adjusted for clustering. One way to do so is to multiply the "conventional" variance estimate by the Moulton factor, defined for equicorrelated regressors as $1 + [\frac{V(n_g)}{\overline{n}}+\overline{n} -1]*\rho_e$, where $\overline{n}$ represents the average cluster size (average number of observations per cluster), $n_g$ is the number of observations per cluster, $V$ is the variance and $\rho_e$ is the residual correlation.
For a reference see for example Angrist and Pischke's "Mostly Harmless Econometrics" page 311.
To my understanding, however, this Moulton factor, which comes up in every treatment of clustering, is derived for homoscedastic errors. In that case the standard errors estimated assuming homoscedasticity are multiplied by the square root of the Moulton factor. This is also what the Stata "moulton.ado" file provided on the website of "Mostly Harmless Econometrics" does.
Does the same formula hold for heteroscedastic errors? Is it correct to multiply heteroscedasticity robust standard errors by this same factor? If not, how does one adjust for clustering parametrically (using the Moulton factor) while also dealing with heteroscedasticity?
Best Answer
I don't come from econometrics, and I have no idea about this Moulton factor. But, I can answer your last question using tools from my native field. You can adjust for both heteroskedasticity and correlated errors if you formulate an appropriate mixed-effects model. For example, consider
$$Y_{ij} = X_{ij}\beta_i + b_j + \epsilon_{ij}$$
where $j$ indexes group of observations, $i$ indexes observations within a group, $Y_{ij}$ is the response, $X_{ij}$ a vector of covariates, $\beta_i$ is the target of inference, $\epsilon_{ij}$ are independent errors, and $b_j$ is a random effect that induces correlations within groups. It seems like you want something like $\epsilon_{ij} \sim N(0, \sigma^2_j)$, with the error variance depending on the group index, and $b_j \sim N(0, \sigma^2_g)$. The trick is to actually fit this model.
One way is to rephrase the heteroskedastic errors as yet more mixed effects terms, leaving behind homoskedastic errors. These can be dealt with using a mixed-modeling tool such as the R package
nlme
. Let $\tau^2_j = \sigma^2_j - \min_j\{\sigma^2_j\}$ and $\min_j\{\sigma^2_j\} = \sigma^2_\delta$. The following model is equivalent to the one above:$$Y_{ij} = X_{ij}\beta_i + b_j + c_{ij} + \delta_{ij}$$
where $Z_j$ is an identity matrix, $c_{ij}$ is normal with variance $\tau^2_j$, and $\delta_{ij}$ is iid noise with variance $\sigma^2_\delta$. You may notice that for some $j$, $c_{ij}$ has variance zero. When you go to fit the model, you will probably need to leave out that term.
Here is some R code that does this for a simple two-group example (using the REML criterion for fitting).