Regression – Do GEE and GLM Estimate the Same Coefficients?

cluster-sampleestimationgeneralized linear modelgeneralized-estimating-equationsregression

In a GLM, the likelihood equations depend on the assumed distribution only through the mean and the variance. The likelihood equations are

$$\sum_i^n (\frac{\partial \mu_i}{\partial \eta_i}) \frac{y_i – \mu_i}{Var(Y_i)}x_{ij} = 0, \quad (j = 1, …, p)$$

and in the quasi-likelihood case, we just let $Var(Y_i) = v(\mu_i)$ be some function of the mean. For GEE, the response is extended to be multivariate, with an assumed correlation structure, with the quasi-likelihood equations.

Does this imply that GEE and GLM will have the same parameter (say $\beta$) estimates (population averaged) with the only difference being correct standard errors in the GEE case (Assuming clustered data?)

If the estimated coefficients are not the same, then what is the difference?

Best Answer

Yes. GEE and GLM will indeed have the same coefficients, but different standard errors. To check, run an example in R. I've taken this example from Chapter 25 of Applied Regression Analysis and Other Multivariable Methods, 5th by Kleinbaum, et. al (just because it's on my desk and references GEE and GLM):

library(geepack)
library(lme4)

#get book data from 
mydf<-read.table("http://www.hmwu.idv.tw/web/bigdata/rstudio-readData/tab/ch25q04.txt", header=TRUE)
mydf<-data.frame(subj=mydf$subj, week=as.factor(mydf$week), fev=mydf$fev)
#Make 5th level the reference level to match book results
mydf$week<-relevel(mydf$week, ref="5")

#Fit GLM Mixed Model
mixed.model<-summary(lme4::lmer(fev~week+(1|subj),data=mydf))
mixed.model$coefficients

                Estimate Std. Error     t value
(Intercept)  6.99850  0.2590243 27.01870247
week1        2.81525  0.2439374 11.54087244
week2       -0.15025  0.2439374 -0.61593680
week3        0.00325  0.2439374  0.01332309
week4       -0.04700  0.2439374 -0.19267241

#Fit a gee model with any correlation structure.  In this case AR1
gee.model<-summary(geeglm(fev~week, id=subj, waves=week, corstr="ar1", data=mydf))
gee.model$coefficients

            [Estimate   Std.err         Wald  Pr(>|W|)
(Intercept)  6.99850 0.2418413 8.374312e+02 0.0000000
week1        2.81525 0.2514376 1.253642e+02 0.0000000
week2       -0.15025 0.2051973 5.361492e-01 0.4640330
week3        0.00325 0.2075914 2.451027e-04 0.9875090
week4       -0.04700 0.2388983 3.870522e-02 0.8440338][1]

UPDATE

As Mark White pointed out in his comment, I did indeed previously fit a "single-level" Mixed Effects GLM. Since you didn't specify whether you wanted a "fixed effects" or "random" effects GLM model, I just picked "random" since that's the model fit in the book I selected from. But indeed, Mark is right that the coefficients do not necessarily agree in multilevel models, and someone provided a nice answer about that question previously. For your reference, I've added a "fixed" effects GLM model below using lm.

#Fit Traditional GLM Fixed Effect Model (i.e. not Random effects)
glm.fixed<-summary(lm(fev~week, data=mydf))
glm.fixed$coefficients
            Estimate Std. Error     t value     Pr(>|t|)
(Intercept)  6.99850  0.2590243 27.01870247 7.696137e-68
week1        2.81525  0.3663157  7.68531179 7.287752e-13
week2       -0.15025  0.3663157 -0.41016538 6.821349e-01
week3        0.00325  0.3663157  0.00887213 9.929302e-01
week4       -0.04700  0.3663157 -0.12830465 8.980401e-01

Note the first and second columns of the output in each model. They coefficients are identity, but standard errors differ.

You also added a comment which asked, "And does this remain the case when we choose a non-linear link function?" Note first that this is a different question since non-linear link functions generally aren't General Linear Models but Generalized Linear models. In this case, the coefficients do not necessarily match. Here's an example again in R:

#Fit Generalized Linear Mixed Effects Model with, say, Binomail Link
nlmixed.model<-summary(lme4::glmer(I(mydf$fev>mean(mydf$fev))~week+(1|subj), family="binomial", data=mydf))
nlmixed.model$coefficients

#Fit GEE model with, say, Binomial Link
nlgee.model<-summary(geeglm(I(mydf$fev>mean(mydf$fev))~week, id=subj, waves=week, family="binomial", data=mydf))
nlgee.model$coefficients
Related Question