Solved – GEE with exchangeable working covariance vs. assuming independence and using Huber-White standard errors

generalized-estimating-equationsrobust

I'm analyzing a dataset including 13000 students. Students are clustered into schools/grades. The ICC (intraclass correlation coefficient) shows that students in a same school are correlated. Therefore, I'd to take this clustering effect into account. One way is to run a linear regression and run the robust variance estimator on top of that to guard against getting biased estimates. Can we take the clustering effect into account with the sandwich estimator?. I think we cannot since the epsilon^2 matrix is still a diagonal matrix.
What I think we should do is to run a GEE model with exchangeable varcov matrix for students in a same school and then we should run the robust varcov on top of the GEE model.

Anyways, assuming that sandwich estimator by itself can account for the school clustering effect, I've fitted two separate GEE models one with exchangeable varcov matrix and the other one with the robust variance estimator (also known as Huber-White, Sandwich Estimator, or empirical variance estimator). The point is under both models I get the same estimated variance per each covariate, but my GEE exchangeable estimates leads to much larger beta estimates that are also statistically significant whereas similar beta covariates are not significant in GEE with robust varcov estimator. I wonder why it happens?
I use SAS 9.3 to fit my model (proc GENMOD):

Exchange:

repeated subject = SCHIID/type = EXCH;

Empirical Estimator:

repeated subject = SCHIID/ covb; 

Best Answer

One way is to run a linear regression and run the robust variance estimator on top of that to guard against getting biased estimates.

An important point here is that having pockets of correlated data does not bias your estimates in a linear model - it results in having inflated standard errors. In a non-linear model (e.g. logistic regression), you can get biased estimates, since the population average effect is, in general, different from the individual specific effect, which is not the case with a linear model. More information on this distinction is in this answer

Can we take the clustering effect into account with the sandwich estimator?

From the title, I assume you're talking about using Huber White sandwich standard errors for your confidence intervals and $p$-values. These do impose a diagonal covariance matrix but are robust to the diagonal entries possibly being different - for that reason they were originally used when there is possible heteroskedasticity in your errors, which means that the error variance in non-constant. But, using a slight modification of the Huber-White standard errors where the "meat" of the sandwich is replaced with an empirical estimate of the covariance matrix within a cluster (still called Huber-White standard errors) provides inference that is robust to non-independence within a cluster (but not between clusters!) - this modification is described pretty clearly in a 2006 paper in The American Statistician called On The So-Called “Huber Sandwich Estimator” and “Robust Standard Errors” by David Freedman.

This procedure robust to non-independence within a cluster in the sense that they will still give you asymptotically unbiased inference (i.e. the confidence levels and $p$-values will be right) even if there is correlation within a cluster. I suspect this is what your code labeled 'Empirical Estimator' code is doing.

I've fitted two separate GEE models one with exchangeable varcov matrix and the other one with the robust variance estimator (also known as Huber-White, Sandwich Estimator, or empirical variance estimator). The point is under both models I get the same estimated variance per each covariate, but my GEE exchangeable estimates leads to much larger beta estimates that are also statistically significant whereas similar beta covariates are not significant in GEE with robust varcov estimator. I wonder why it happens?

In general, the GEE model solves the equation

$$ \sum_{i=1}^{n} \frac{ \partial \mu_i }{ \partial \beta } V(\alpha)^{-1} (y_i - \mu_i) = 0$$

as a function of the regression coefficients, $\beta$, where $\mu_i = x_i \beta$ is the expected values of the cluster $i$ response, $y_i$, given the predictors $x_i$ under the specified model. $V(\alpha)^{-1}$ is the "working" covariance matrix of the of the elements of cluster $i$. (note that $\mu_i=x_i \beta$ because we're dealing with a linear model but GEEs can more generally use a link function so that $\mu_i=g(x_i \beta))$

A key point here is that when you change the working covariance, you change the estimating equation, therefore the $\beta$ that solves it will be different. For example, if $V$ was $\sigma^2$ down the diagonal and $0$ off the diagonal and $\mu_i = x_i \beta$ as it does here, then the GEE estimator is the least squares estimator, which will not solve that equation in the exchangeable case. So it is no surprise that you're getting different parameter estimates. It may be a coincidence that you're getting the same standard errors.

In your situation, I'd suggest reporting the results that used the exchangeable covariance matrix since. While GEE-based inference in consistent even when you're misspecified the correlation structure, it is known that GEE estimators are more efficient when you use a more appropriate covariance structure and, if you have evidence that there are large intra-class correlations within a school, then the exchangeable correlation will probably provide a much closer approximation to the true association structure.