I prefer to call GEE an estimation method compared to ML or REML, since it combines quasi-likelihood estimation with robust variance estimation to estimate generalized linear marginal models for longitudinal data. Some texts and papers also call "GEE models", e.g. Hedeker, D., & Gibbons, R. D. (2006). Longitudinal data analysis. Wiley-Interscience
. I guess it is to separate it from subject-specific (fixed and random effects) models, since GEE is mainly regarded as or marginal (population average) models.
We have no idea about the distribution function of the outcome, but we know its mean ($\mu$) and variance ($V$). So we cannot do ML but we can turn to the quasi-likelihood,
$$Q(\mu,y)=\int^{\mu}_y(y-t)^TV^{-1}dt,$$
and the quasi-likelihood estimating equations (quasi-score function) is
$$\sum_i\frac{\partial{\mu_i^{'}}}{\partial{\beta}}V_i^{-1}(y_i-\mu_i)=0.$$
Thus the estimating equations are derived without specifying the joint distribution of a outcomes but they reduce to the score equations (marginal distributions). The approach based on maximum likelihood (ML) estimation specifies the joint multivariate normal distribution of outcome variables, while the approach of GEE based on the quasi-likelihood specifies only the marginal distributions.
I have seen GEE was applied in statistical genetics, but I am afraid it is also under the framework of generalized linear models.
Yes. GEE and GLM will indeed have the same coefficients, but different standard errors. To check, run an example in R. I've taken this example from Chapter 25 of Applied Regression Analysis and Other Multivariable Methods, 5th by Kleinbaum, et. al (just because it's on my desk and references GEE and GLM):
library(geepack)
library(lme4)
#get book data from
mydf<-read.table("http://www.hmwu.idv.tw/web/bigdata/rstudio-readData/tab/ch25q04.txt", header=TRUE)
mydf<-data.frame(subj=mydf$subj, week=as.factor(mydf$week), fev=mydf$fev)
#Make 5th level the reference level to match book results
mydf$week<-relevel(mydf$week, ref="5")
#Fit GLM Mixed Model
mixed.model<-summary(lme4::lmer(fev~week+(1|subj),data=mydf))
mixed.model$coefficients
Estimate Std. Error t value
(Intercept) 6.99850 0.2590243 27.01870247
week1 2.81525 0.2439374 11.54087244
week2 -0.15025 0.2439374 -0.61593680
week3 0.00325 0.2439374 0.01332309
week4 -0.04700 0.2439374 -0.19267241
#Fit a gee model with any correlation structure. In this case AR1
gee.model<-summary(geeglm(fev~week, id=subj, waves=week, corstr="ar1", data=mydf))
gee.model$coefficients
[Estimate Std.err Wald Pr(>|W|)
(Intercept) 6.99850 0.2418413 8.374312e+02 0.0000000
week1 2.81525 0.2514376 1.253642e+02 0.0000000
week2 -0.15025 0.2051973 5.361492e-01 0.4640330
week3 0.00325 0.2075914 2.451027e-04 0.9875090
week4 -0.04700 0.2388983 3.870522e-02 0.8440338][1]
UPDATE
As Mark White pointed out in his comment, I did indeed previously fit a "single-level" Mixed Effects GLM. Since you didn't specify whether you wanted a "fixed effects" or "random" effects GLM model, I just picked "random" since that's the model fit in the book I selected from. But indeed, Mark is right that the coefficients do not necessarily agree in multilevel models, and someone provided a nice answer about that question previously. For your reference, I've added a "fixed" effects GLM model below using lm
.
#Fit Traditional GLM Fixed Effect Model (i.e. not Random effects)
glm.fixed<-summary(lm(fev~week, data=mydf))
glm.fixed$coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.99850 0.2590243 27.01870247 7.696137e-68
week1 2.81525 0.3663157 7.68531179 7.287752e-13
week2 -0.15025 0.3663157 -0.41016538 6.821349e-01
week3 0.00325 0.3663157 0.00887213 9.929302e-01
week4 -0.04700 0.3663157 -0.12830465 8.980401e-01
Note the first and second columns of the output in each model. They coefficients are identity, but standard errors differ.
You also added a comment which asked, "And does this remain the case when we choose a non-linear link function?" Note first that this is a different question since non-linear link functions generally aren't General Linear Models but Generalized Linear models. In this case, the coefficients do not necessarily match. Here's an example again in R:
#Fit Generalized Linear Mixed Effects Model with, say, Binomail Link
nlmixed.model<-summary(lme4::glmer(I(mydf$fev>mean(mydf$fev))~week+(1|subj), family="binomial", data=mydf))
nlmixed.model$coefficients
#Fit GEE model with, say, Binomial Link
nlgee.model<-summary(geeglm(I(mydf$fev>mean(mydf$fev))~week, id=subj, waves=week, family="binomial", data=mydf))
nlgee.model$coefficients
Best Answer