Statistical Models – When to Use Generalized Estimating Equations vs. Mixed Effects Models

generalized-estimating-equationsmixed model

I have been quite happily using mixed effects models for a while now with longitudinal data. I wish I could fit AR relationships in lmer (I think I'm right that I can't do this?) but I don't think it's desperately important so I don't worry too much.

I've just come across generalized estimating equations (GEE), and they seem to offer a lot more flexibility than ME models.

At the risk of asking an over-general question, is there any advice as to which is better for different tasks? I've seen some papers comparing them, and they tend to be of the form:

"In this highly specialised area, don't use GEEs for X, don't use ME models for Y".

I haven't found any more general advice. Can anyone enlighten me?

Thank you!

Best Answer

Use GEE when you're interested in uncovering the population average effect of a covariate vs. the individual specific effect. These two things are only equivalent in linear models, but not in non-linear (e.g. logistic). To see this, take, for example the random effects logistic model of the $j$'th observation of the $i$'th subject, $Y_{ij}$;

$$ \log \left( \frac{p_{ij}}{1-p_{ij}} \right) = \mu + \eta_{i} $$

where $\eta_{i} \sim N(0,\sigma^{2})$ is a random effect for subject $i$ and $p_{ij} = P(Y_{ij} = 1|\eta_{i})$.

If you used a random effects model on these data, then you would get an estimate of $\mu$ that accounts for the fact that a mean zero normally distributed perturbation was applied to each individual, making it individual specific.

If you used GEE on these data, you would estimate the population average log odds. In this case that would be

$$ \nu = \log \left( \frac{ E_{\eta} \left( \frac{1}{1 + e^{-\mu-\eta_{i}}} \right)}{ 1-E_{\eta} \left( \frac{1}{1 + e^{-\mu-\eta_{i}}} \right)} \right) $$

$\nu \neq \mu$, in general. For example, if $\mu = 1$ and $\sigma^{2} = 1$, then $\nu \approx .83$. Although the random effects have mean zero on the transformed (or linked) scale, their effect is not mean zero on the original scale of the data. Try simulating some data from a mixed effects logistic regression model and comparing the population level average with the inverse-logit of the intercept and you will see that they are not equal, as in this example. This difference in the interpretation of the coefficients is the fundamental difference between GEE and random effects models.

Edit: In general, a mixed effects model with no predictors can be written as

$$ \psi \big( E(Y_{ij}|\eta_{i}) \big) = \mu + \eta_{i} $$

where $\psi$ is a link function. Whenever

$$ \psi \Big( E_{\eta} \Big( \psi^{-1} \big( E(Y_{ij}|\eta_{i}) \big) \Big) \Big) \neq E_{\eta} \big( E(Y_{ij}|\eta_{i}) \big) $$

there will be a difference between the population average coefficients (GEE) and the individual specific coefficients (random effects models). That is, the averages change by transforming the data, integrating out the random effects on the transformed scale, and then transformating back. Note that in the linear model, (that is, $\psi(x) = x$), the equality does hold, so they are equivalent.

Edit 2: It is also worth noting that the "robust" sandwich-type standard errors produced by a GEE model provide valid asymptotic confidence intervals (e.g. they actually cover 95% of the time) even if the correlation structure specified in the model is not correct.

Edit 3: If your interest is in understanding the association structure in the data, the GEE estimates of associations are notoriously inefficient (and sometimes inconsistent). I've seen a reference for this but can't place it right now.