GEE – Choosing Proper Working Correlation Structure

generalized-estimating-equations

I am an epidemiologist trying to understand GEEs in order to properly analyze a cohort study (using Poisson regression with a log link, to estimate Relative Risk). I have a few questions about the "working correlation" that I would like someone more knowledgable to clarify:

(1) If I have repeated measurements in the same individual, is it usually most reasonable to assume an exchangeable structure? (Or an autoregressive if measurements show a trend)? What about independence – are there any cases where one could assume independence for measurements in the same individual?

(2) Is there any (reasonably simple) way to assess the proper structure by examining the data?

(3) I noticed that, when choosing an independence structure, I get the same point estimates (but lower standard errors) as when running a simple Poisson regression (using R, function glm() and geeglm() from package geepack). Why is this happening? I understand that with GEEs you estimate a population-averaged model (in contrast to subject-specific) so you should get the same point estimates only in the linear regression case.

(4) If my cohort is at multiple location sites (but one measurement per individual), should I choose an independence or an exchangeable working correlation, and why? I mean, individuals in each site are still independent from each other, right?? Thus for a subject-specific model, for example, I would specify the site as a random effect. With GEE however, independence and exchangeable give different estimates and I am not sure which one is better in terms of underlying assumptions.

(5) Can GEE handle a 2-level hierarchical clustering, i.e. a multi-site cohort with repeated measures per individual? If yes, what should I specify as a clustering variable in geeglm() and what should be the working correlation if one assumes for example "independence" for the first level (site) and "exchangeable" or "autoregressive" for the second level (individual)?

I understand these are quite a few questions, and some of them may be fairly basic, but still very difficult for me (and maybe other novices?) to grasp. So, any help is greatly and sincerely appreciated, and to show this I have started a bounty.

Best Answer

  1. Not necessarily. With small clusters, imbalanced design, and incomplete within-cluster confounder adjustment, exchangeable correlation may be more inefficient and biased relative than independence GEE. Those assumptions can be rather strong, too. However, when those assumptions are met, you get more efficient inference with the exchangeable. I have never found an instance when AR-1 correlation structures make sense, since it's uncommon to have measurements that are balanced in time (I work with human subjects data).

  2. Well, exploring correlation is good and should be done in data analysis. However, it really shouldn't guide decision making. You can use variograms and lorellograms to visualize correlation in longitudinal and panel studies. Intracluster correlation is a good measurement of the extent of correlation within clusters.

  3. Correlation structure in GEE, unlike mixed models, does not affect the marginal parameter estimates (which you are estimating with GEE). It does affect the standard error estimates though. This is independent of any link function. The link function in the GEE is for the marginal model.

  4. Sites can be sources of unmeasured variation, such as teeth within a mouth, or students within a school district. There is the potential for cluster level confounders in these data, such as genetic propensity to tooth decay or community education funding, so for that reason, you will get better standard error estimates by using an exchangeable correlation structure.

  5. Calculation of marginal effects in a GEE is complicated when they're not nested but it can be done. Nesting is easy, and you do just as you've said.