For a continuous outcome being analyzed using GEE with a linear link, you have assurance that standard errors and point estimates are consistent with a first order trend regardless of distribution of outcome, heteroscedasticity, and mild non-linearity problems. Point estimates from the GEE are the same as those obtained from maximum likelihood (OLS), but the standard error estimates are the HC sandwich based errors and thus swamp up mild bits of classical model assumption violations.
In longitudinal analyses where attrition depends upon measured variables (e.g. age), you know that the so-called "missing data mechanism" is missing at random (not missing COMPLETELY at random, per Little, Rubin 2002) and, further, that maximum likelihood estimates "are not biased" due to the factorization of the likelihood including the missing data indicator and unobserved likelihood contribution due to measured rows.
My questions are:
- For ML estimates, are complete case analyses considered efficient?
- For GEE with linear link, are estimates somehow biased even though they're the same as those obtained from ML?
- Is the real problem that SEs from GEE with linear link are not guaranteed to be consistent? More so than is attributable to effective sample size loss due to complete case analysis?
- Does weighting promise to help remedy the the SEs above and beyond effective sample size loss due to complete case analysis if there are other reasons why the GEE would be "wrong" in this case?
Best Answer
You should also note that typically in longitudinal studies with attrition the dropout can depend both on measured covariates but also on the response at times you didn't observe, so you can't just say "I collected everything I think to be associated with dropout" and say you have MAR. MAR is a genuine assumption about how the world works, and it cannot be checked from the data. If two people with the same response history and same covariates are on study and one drops out and one does not, MAR essentially states that you can use the guy who stayed on to learn the distribution of the guy who dropped out, and this is a very strong assumption. In longitudinal studies, the consensus among experts is that an analysis of sensitivity to the MAR assumption is ideal, but I don't think this has made it into the software world yet.
Unfortunately, I'm not aware of any software for doing doubly robust estimation, but likelihood-based estimation is easy (IMO the easiest thing to do is use Bayesian software for fitting, but there is also lots of software out there). You can also do inverse probability weighting easily, but it has stability issues.