Solved – Moulton Factor and Clustering of Standard Errors with Heteroscedasticity

clustered-standard-errorsheteroscedasticityregression

When running a linear regression, standard errors may need to be adjusted for clustering. One way to do so is to multiply the "conventional" variance estimate by the Moulton factor, defined for equicorrelated regressors as $1 + [\frac{V(n_g)}{\overline{n}}+\overline{n} -1]*\rho_e$, where $\overline{n}$ represents the average cluster size (average number of observations per cluster), $n_g$ is the number of observations per cluster, $V$ is the variance and $\rho_e$ is the residual correlation.

For a reference see for example Angrist and Pischke's "Mostly Harmless Econometrics" page 311.

To my understanding, however, this Moulton factor, which comes up in every treatment of clustering, is derived for homoscedastic errors. In that case the standard errors estimated assuming homoscedasticity are multiplied by the square root of the Moulton factor. This is also what the Stata "moulton.ado" file provided on the website of "Mostly Harmless Econometrics" does.

Does the same formula hold for heteroscedastic errors? Is it correct to multiply heteroscedasticity robust standard errors by this same factor? If not, how does one adjust for clustering parametrically (using the Moulton factor) while also dealing with heteroscedasticity?

Best Answer

I don't come from econometrics, and I have no idea about this Moulton factor. But, I can answer your last question using tools from my native field. You can adjust for both heteroskedasticity and correlated errors if you formulate an appropriate mixed-effects model. For example, consider

$$Y_{ij} = X_{ij}\beta_i + b_j + \epsilon_{ij}$$

where $j$ indexes group of observations, $i$ indexes observations within a group, $Y_{ij}$ is the response, $X_{ij}$ a vector of covariates, $\beta_i$ is the target of inference, $\epsilon_{ij}$ are independent errors, and $b_j$ is a random effect that induces correlations within groups. It seems like you want something like $\epsilon_{ij} \sim N(0, \sigma^2_j)$, with the error variance depending on the group index, and $b_j \sim N(0, \sigma^2_g)$. The trick is to actually fit this model.

One way is to rephrase the heteroskedastic errors as yet more mixed effects terms, leaving behind homoskedastic errors. These can be dealt with using a mixed-modeling tool such as the R package nlme. Let $\tau^2_j = \sigma^2_j - \min_j\{\sigma^2_j\}$ and $\min_j\{\sigma^2_j\} = \sigma^2_\delta$. The following model is equivalent to the one above:

$$Y_{ij} = X_{ij}\beta_i + b_j + c_{ij} + \delta_{ij}$$

where $Z_j$ is an identity matrix, $c_{ij}$ is normal with variance $\tau^2_j$, and $\delta_{ij}$ is iid noise with variance $\sigma^2_\delta$. You may notice that for some $j$, $c_{ij}$ has variance zero. When you go to fit the model, you will probably need to leave out that term.

Here is some R code that does this for a simple two-group example (using the REML criterion for fitting).

require(ggplot2)
require(nlme)
group = cars$speed==4 | cars$speed >=24
y1 = cars$speed[group]; n1 = length(y1)
y2 = cars$speed[!group]; n2 = length(y2)
# This next line tells me to omit c_ij for group 2. 
# With equal sigma_g across groups, smallest within-group variance implies smallest error variance. 
# If I were fitting more than just a mean to the data, I'd have to try 
# all the groups or something unpleasant like that.
var(y1) > var(y2)
y = c(y1, y2); group = c(rep(1, n1), rep(2, n2))
# To force-feed this into nlme, I encode all the random effects as a matrix
# to be multiplied by IID normal variates. Note the zeroes for c_2j.
z1 = diag(rep(1, n1))
z1 = cbind(z1, 1)
zzero = matrix(0, ncol = n1 + 1, nrow = n2)
z = cbind(rbind(z1, zzero))
z = cbind(z, c(rep(0, n1), rep(1, n2)))
colnames(z) = c(paste0("c", 1:7), "b1", "b2")
ggplot(reshape2::melt(z)) + geom_tile(aes(y = -Var1, x = Var2, fill = value))
const = c(rep(1, n1 + n2))
mydata = data.frame(y, group, z, const)
# The pdClass="pdIdent" line gives iid normal variates that get postmultiplied
# into the columns of my matrix Z specified by the formulas in the list above it.
# The + 0 gets rid of an intercept term.
mod = nlme::lme(data = mydata, 
                fixed = y ~ 1,
                method = "REML",
                random = reStruct(list( ~c1 + c2 + c3 + c4 + c5 + c6 + c7+ 0 | const,
                                        ~b1 + b2 + 0 |const),
                                  pdClass="pdIdent"))
summary(mod)

Best Answer

Related Solutions

Solved – Does # of observations in each cluster matter for cluster-robust standard errors

Solved – Clustered standard errors – Why are SE smaller or bigger than OLS depending on cluster level

Related Question