From my understanding, glm(not glmer) and GEE both handle binary values. But GEE is a marginal model and glmer is a random effects model (mixed model). So then what is the main difference between GLM (Not glmer) and GEE? Is GEE a longitudinal data version of GLM? Does it mean you can run 'glm' to only cross-sectional data?
Solved – the main difference between GLM and GEE
generalized linear modelgeneralized-estimating-equationslongitudinal-data-analysis
Related Solutions
In terms of the interpretation of the coefficients, there is a difference in the binary case (among others). What differs between GEE and GLMM is the target of inference: population-average or subject-specific.
Let's consider a simple made-up example related to yours. You want to model the failure rate between boys and girls in a school. As with most (elementary) schools, the population of students is divided into classrooms. You observe a binary response $Y$ from $n_i$ children in $N$ classrooms (i.e. $\sum_{i=1}^{N}n_{i}$ binary responses clustered by classroom), where $Y_{ij}=1$ if student $j$ from classroom $i$ passed and $Y_{ij}=0$ if he/she failed. And $x_{ij} =1$ if student $j$ from classroom $i$ is male and 0 otherwise.
To bring in the terminology I used in the first paragraph, you can think of the school as being the population and the classrooms being the subjects.
First consider GLMM. GLMM is fitting a mixed-effects model. The model conditions on the fixed design matrix (which in this case is comprised of the intercept and indicator for gender) and any random effects among classrooms that we include in the model. In our example, let's include a random intercept, $b_i$, which will take the baseline differences in failure rate among classrooms into account. So we are modelling
$\log \left(\frac{P(Y_{ij}=1)}{P(Y_{ij}=0)}\mid x_{ij}, b_i\right)=\beta_0+\beta_1 x_{ij} + b_i $
The odds ratio of risk of failure in the above model differs based on the value of $b_i$ which is different among classrooms. Thus the the estimates are subject-specific.
GEE, on the other hand, is fitting a marginal model. These model population-averages. You're modeling the expectation conditional only on your fixed design matrix.
$\log \left(\frac{P(Y_{ij}=1)}{P(Y_{ij}=0)}\mid x_{ij}\right)=\beta_0+\beta_1 x_{ij} $
This is in contrast to mixed effect models as explained above which condition on both the fixed design matrix and the random effects. So with the marginal model above you're saying, "forget about the difference among classrooms, I just want the population (school-wise) rate of failure and its association with gender." You fit the model and get an odds ratio that is the population-averaged odds ratio of failure associated with gender.
So you may find that your estimates from your GEE model may differ your estimates from your GLMM model and that is because they are not estimating the same thing.
(As far as converting from log-odds-ratio to odds-ratio by exponentiating, yes, you do that whether its a population-level or subject-specific estimate)
Some Notes/Literature:
For the linear case, the population-average and subject-specific estimates are the same.
Zeger, et al. 1988 showed that for logistic regression,
$\beta_M\approx \left[ \left(\frac{16\sqrt{3}}{15\pi }\right)^2 V+1\right]^{-1/2}\beta_{RE}$
where $\beta_M$ are the marginal esttimates, $\beta_{RE}$ are the subject-specific estimates and $V$ is the variance of the random effects.
Molenberghs, Verbeke 2005 has an entire chapter on marginal vs. random effects models.
I learned about this and related material in a course based very much off Diggle, Heagerty, Liang, Zeger 2002, a great reference.
I wish you had mentioned whether your outcome was continuous or not or had some bizarre looking distribution or was like normal ..you know...
That said, I think you should not be surprised both gave you the same results and should give you the same estimates, except in few cases, not to risk being off topic, better left untouched.
I always liked so much, how Twisk in his "Applied Longitudinal Data Analysis for Epidemiology" explains GEE and Mixed models. I have slightly modified few lines (with your example) otherwise I am quoting from page 88 of his book.
"The interpretation of the regression coefficients of a predictor variables from a random coefficient analysis is exactly the same as the interpretation of the regression coefficients estimated with GEE analysis, so the interpretation is twofold: (1) the ‘between-subjects’ interpretation indicates that a difference between two subjects of 1 unit in, for instance, the predictor variable X2 is associated with a difference of 0.20-units (this is your beta) in the outcome variable Y; (2) the ‘within-subject’ interpretation indicates that a change within one subject of 1 unit in the predictor variable X2 is associated with a change of 0.20-unit in the outcome variable Y. Again, the ‘real’ interpretation is a combination of both relationships."
Hubbard et al, mentioned the following interpretation and along with other more reasons they argued GEE is more close to the truth than mixed models. So..
In case of a mixed effect linear relationship "Change in the mean outcome for a unit change in the associated neighborhood exposure, keeping the random effect (neighborhood) fixed"
In case of the GEE "Change in the mean outcome for a unit change in the associated neighborhood exposure across all of the neighborhoods observed"
This article might interest you. But, please careful. It is all about assumption, nothing else.
Hubbard AE, Ahern J, Fleischer NL, Van der Laan M, Lippman SA, Jewell N, et al. To GEE or not to GEE: comparing population average and mixed models for estimating the associations between neighborhood risk factors and health. Epidemiology (Cambridge, Mass). 2010;21(4):467-74.
Best Answer
Indeed, GLMs do not account for correlations you may have in your outcome data. Hence, they are more suitable for cross-sectional data, because in longitudinal data you expect that measurements over time from the same subject are correlated.
With regard to the interpretation of the coefficients you obtain, the GEEs can be seen as the equivalent of GLMs because they will also have a marginal intepretation. This is different than generalized linear mixed models, in which the fixed effects coefficients have an interpretation conditional on the random effects (though based on recent developments it is possible to get coefficients with a marginal intepretation from a GLMM; for more info check here).
With regard to the estimation, as mentioned in one of the comments above, GEEs are not based on a model that has a specific likelihood. On the one hand this makes them semi-parametric and you do not need to specify the distribution of your data, but on the other hand (i) you can only use Wald tests and not likelihood ratio tests, (ii) they are less efficient than a likelihood-based model in which you have appropriately specified the correlation structure, and (iii) in their basic form and with regard to missing data, they are only valid under the missing completely at random missing data mechanism, whereas a likelihood-based approach under the missing at random mechanism.