GEE – Does the Sandwich Estimator Protect Against Correlation Misspecification and Heteroscedasticity?

generalized-estimating-equationsheteroscedasticitymisspecificationpanel datarobust-standard-error

The relative merits of GEE with exchangeable correlation or GEE with independence and the sandwich estimate have been discussed, but I couldn't find a post specifically addressing my question.

I have analyzed longitudinal data using GEE with AR(1) correlation structure and the sandwich/robust estimates of the standard errors.

In the case of longitudinal data, the sandwich estimates are used to protect against miss-specifications of the correlation structure. The AR(1) was chosen based on the auto-correlation function of the data since it will produce a more efficient estimate than assuming independence. However, the univariate distrubitions of the outcomes are positively skewed and the residuals from the models exhibit some heteroscedasticity.

For GEE with longitudinal (or more generally, clustered) data, does the sandwich estimate protect against heteroscedasticity (what it was developed to do) in addition to its intended use to protect against miss-specificaiton of the correlation structure? Does reducing the heteroscedastcity with a log transformation of the outcome offer more efficiency for the estimate or some other benefit?

I would especially appreciate a reference to a journal article addressing this issue.

Best Answer

Yes. Sandwich estimator for $\mathrm{Cov}(\hat{\beta})$ is robust to the assumption of covariance, which include both the variances and correlations. A good reference of book is on page 359 of Applied Longitudinal Analysis.

Note that we sometimes call sandwich estimator as "semi-robust", since it is only robust to the misspecification of variance-covariance model, but is not robust to the mean model.

When doing exploratory analysis about the variance-covariance structure, I would suggest to remove the effects of covariates first, i.e., to examine the residuals. In this way, we can truely assess the variance-covariance structure of the error term without the influence of variability from the covariates.

Unlike mixed-effects models, we cannot rigorously test two nested variance-covariance models by likelihood ratio tests. GEE is based on quasi-likelihood, so no likelihood-based method is available to test variance models. On the other hand, if you are interested in the subject specific interpretation, mixed-effects models may help to take into account subject-level heterogeneity.

The transformation of the outcome may reduce the heterogeneity. But I would suggest you to transform the data based on the real relationship between the covariates and outcome, since sometimes the transformation makes the interpretation harder. If your response is count, you can use Poisson regression which adopts log link function.