Generalized Linear Models vs. Generalized Linear Mixed Models Explained

generalized linear modelgeneralized-estimating-equationsglmmmixed model

I am wondering what the differences are between mixed and unmixed GLMs. For instance, in SPSS the drop down menu allows users to fit either:

  • analyze-> generalized linear models-> generalized linear models &
  • analyze-> mixed models-> generalized linear

Do they deal with missing values differently?

My dependent variable is binary and I have several categorical and continuous independent variables.

Best Answer

The advent of generalized linear models has allowed us to build regression-type models of data when the distribution of the response variable is non-normal--for example, when your DV is binary. (If you would like to know a little more about GLiMs, I wrote a fairly extensive answer here, which may be useful although the context differs.) However, a GLiM, e.g. a logistic regression model, assumes that your data are independent. For instance, imagine a study that looks at whether a child has developed asthma. Each child contributes one data point to the study--they either have asthma or they don't. Sometimes data are not independent, though. Consider another study that looks at whether a child has a cold at various points during the school year. In this case, each child contributes many data points. At one time a child might have a cold, later they might not, and still later they might have another cold. These data are not independent because they came from the same child. In order to appropriately analyze these data, we need to somehow take this non-independence into account. There are two ways: One way is to use the generalized estimating equations (which you don't mention, so we'll skip). The other way is to use a generalized linear mixed model. GLiMMs can account for the non-independence by adding random effects (as @MichaelChernick notes). Thus, the answer is that your second option is for non-normal repeated measures (or otherwise non-independent) data. (I should mention, in keeping with @Macro's comment, that general-ized linear mixed models include linear models as a special case and thus can be used with normally distributed data. However, in typical usage the term connotes non-normal data.)

Update: (The OP has asked about GEE as well, so I will write a little about how all three relate to each other.)

Here's a basic overview:

  • a typical GLiM (I'll use logistic regression as the prototypical case) lets you model an independent binary response as a function of covariates
  • a GLMM lets you model a non-independent (or clustered) binary response conditional on the attributes of each individual cluster as a function of covariates
  • the GEE lets you model the population mean response of non-independent binary data as a function of covariates

Since you have multiple trials per participant, your data are not independent; as you correctly note, "[t]rials within one participant are likely to be more similar than as compared to the whole group". Therefore, you should use either a GLMM or the GEE.

The issue, then, is how to choose whether GLMM or GEE would be more appropriate for your situation. The answer to this question depends on the subject of your research--specifically, the target of the inferences you hope to make. As I stated above, with a GLMM, the betas are telling you about the effect of a one unit change in your covariates on a particular participant, given their individual characteristics. On the other hand with the GEE, the betas are telling you about the effect of a one unit change in your covariates on the average of the responses of the entire population in question. This is a difficult distinction to grasp, especially because there is no such distinction with linear models (in which case the two are the same thing).

One way to try to wrap your head around this is to imagine averaging over your population on both sides of the equals sign in your model. For example, this might be a model: $$ \text{logit}(p_i)=\beta_{0}+\beta_{1}X_1+b_i $$ where: $$ \text{logit}(p)=\ln\left(\frac{p}{1-p}\right),~~~~~\&~~~~~~b\sim\mathcal N(0,\sigma^2_b) $$ There is a parameter that governs the response distribution ($p$, the probability, with binary data) on the left side for each participant. On the right hand side, there are coefficients for the effect of the covariate[s] and the baseline level when the covariate[s] equals 0. The first thing to notice is that the actual intercept for any specific individual is not $\beta_0$, but rather $(\beta_0+b_i)$. But so what? If we are assuming that the $b_i$'s (the random effect) are normally distributed with a mean of 0 (as we've done), certainly we can average over these without difficulty (it would just be $\beta_0$). Moreover, in this case we don't have a corresponding random effect for the slopes and thus their average is just $\beta_1$. So the average of the intercepts plus the average of the slopes must be equal to the logit transformation of the average of the $p_i$'s on the left, mustn't it? Unfortunately, no. The problem is that in between those two is the $\text{logit}$, which is a non-linear transformation. (If the transformation were linear, they would be equivalent, which is why this problem doesn't occur for linear models.) The following plot makes this clear: enter image description here
Imagine that this plot represents the underlying data generating process for the probability that a small class of students will be able to pass a test on some subject with a given number of hours of instruction on that topic. Each of the grey curves represents the probability of passing the test with varying amounts of instruction for one of the students. The bold curve is the average over the whole class. In this case, the effect of an additional hour of teaching conditional on the student's attributes is $\beta_1$--the same for each student (that is, there is not a random slope). Note, though, that the students baseline ability differs amongst them--probably due to differences in things like IQ (that is, there is a random intercept). The average probability for the class as a whole, however, follows a different profile than the students. The strikingly counter-intuitive result is this: an additional hour of instruction can have a sizable effect on the probability of each student passing the test, but have relatively little effect on the probable total proportion of students who pass. This is because some students might already have had a large chance of passing while others might still have little chance.

The question of whether you should use a GLMM or the GEE is the question of which of these functions you want to estimate. If you wanted to know about the probability of a given student passing (if, say, you were the student, or the student's parent), you want to use a GLMM. On the other hand, if you want to know about the effect on the population (if, for example, you were the teacher, or the principal), you would want to use the GEE.

For another, more mathematically detailed, discussion of this material, see this answer by @Macro.