Solved – Mixed Model Versus GEE estimates and which to use

generalized-estimating-equationsmixed model

I am new to both GEE and mixed modeling, so please bare with me:

Briefly: my exposure is television viewing in childhood (tv) and I am trying to assess change in body mass index (bmisds) over 3 time (time) points – ages 4, 12 and 13 with adjustment for covariates. I first ran the analysis as a generalized estimating equation using proc genmod.
When I run this analysis as a mixed effects model (with a random intercept and time treated as a random effect), I am essentially getting extremely similar/same parameter estimates to the GEE. Am I doing something wrong? This is the code I used:

proc mixed data=test;
class tv(ref="2.00") M_ID mom_gc(ref="1.00") brfed(ref="2.00") 
preec(ref="0.00") gender_merged
firstborn sectioyes mat_ed activityhs4; 
model bmisds=time tv time*tv mom_gc brfed preec gender_merged firstborn 
sectioyes mat_ed activityhs4 GA_weeks MBMIsvkon1
maternal_age_birth weightsds_1/s corrb;
random int time/subject=M_ID;
run;

If it is right, why are the estimates so similar and which one do you recommend using? As I understand it, the GEE gives you population effects but the mixed effects model gives you both population averages and subject-specific effects, so how would I interpret an interaction for tv (2hours)*time with an estimate of 0.20 for example in the mixed effects model above? Instead of saying "children who watch TV for >2 hours had, on average, a 0.20-unit higher body mass index over time" as I might for the GEE, would I say "a child who watches >2 hours of TV has a 0.20-unit higher body mass index over time"? Where do the random effects come in? I am not seeing anything "extra" in the output of the mixed effects model over the GEE..

Best Answer

I wish you had mentioned whether your outcome was continuous or not or had some bizarre looking distribution or was like normal ..you know...

That said, I think you should not be surprised both gave you the same results and should give you the same estimates, except in few cases, not to risk being off topic, better left untouched.

I always liked so much, how Twisk in his "Applied Longitudinal Data Analysis for Epidemiology" explains GEE and Mixed models. I have slightly modified few lines (with your example) otherwise I am quoting from page 88 of his book.

"The interpretation of the regression coefficients of a predictor variables from a random coefficient analysis is exactly the same as the interpretation of the regression coefficients estimated with GEE analysis, so the interpretation is twofold: (1) the ‘between-subjects’ interpretation indicates that a difference between two subjects of 1 unit in, for instance, the predictor variable X2 is associated with a difference of 0.20-units (this is your beta) in the outcome variable Y; (2) the ‘within-subject’ interpretation indicates that a change within one subject of 1 unit in the predictor variable X2 is associated with a change of 0.20-unit in the outcome variable Y. Again, the ‘real’ interpretation is a combination of both relationships."

Hubbard et al, mentioned the following interpretation and along with other more reasons they argued GEE is more close to the truth than mixed models. So..

In case of a mixed effect linear relationship "Change in the mean outcome for a unit change in the associated neighborhood exposure, keeping the random effect (neighborhood) fixed"

In case of the GEE "Change in the mean outcome for a unit change in the associated neighborhood exposure across all of the neighborhoods observed"

This article might interest you. But, please careful. It is all about assumption, nothing else.

Hubbard AE, Ahern J, Fleischer NL, Van der Laan M, Lippman SA, Jewell N, et al. To GEE or not to GEE: comparing population average and mixed models for estimating the associations between neighborhood risk factors and health. Epidemiology (Cambridge, Mass). 2010;21(4):467-74.

Related Solutions

Solved – Cluster selection and formula in (longitudinal) GEE models

You will want to use subject (or subject ID) as your cluster. GEE takes into account the repeated measurements on clusters, in this case the repeated measure is on individuals over time. So, you'd want to use

gee(QoL ~ education + sex + time, id = subject-ID)

An easy way to determine what the cluster is, is to determine what object are multiple measurements being taken on. In this case, the multiple measurements are being made on a subject. You aren't making measurements on "a time" or on "an education."

By the way, I would recommend using geeglm as you can control the ordering of the measurements using the waves argument to geeglm, which I find is usually needed.

GEE – Comparing GEE vs Mixed Model for Time-Varying Covariate Analysis

Indeed, for dichotomous outcomes, as you seem to have here, the corresponding mixed effects model, namely a mixed effects logistic regression gives you fixed effects coefficients that have an intepretation conditional on the random effects. A detailed explanation can be found here. Most often, this is not the interpretation you want. The GEE approach does give you coefficients with a marginal / population-averaged interpretation.

However, an additional practical point that you also need to consider is missing data. You have not given us enough details with regard to this point for your application, but almost always we have to deal with incomplete data. With regard to this point, mixed models give you valid results under the less stringent missing at random assmuption compared to the (standard not weighted) GEE that give you valid results under the less realistic missing completely at random assumption.

Taking both points (i.e., interpretation and missing data) into account, you would most often like to fit a mixed model to be more protected for the missing data but want to obtain parameters that have a population averaged intepretation. An early solution towards this direction was the marginalized mixed models propoposed by Heagerty, but, in general, these are computionally intensive to fit. A more recent approach that seems to solve the problem has been proposed by Hedeker et al. This is implemented in the function marginal_coefs() of the R package GLMMadaptive. You can find an example on how to use this function here.

Best Answer

Related Solutions

Solved – Cluster selection and formula in (longitudinal) GEE models

GEE – Comparing GEE vs Mixed Model for Time-Varying Covariate Analysis

Related Question