Random Effects – Why Random Effects Assume a Normal Distribution in (G)LMMs

glmmmixed modelnormal distribution

In short, my question is as follows:

  • Why is it common to assume normally distributed random effects (especially in generalized linear mixed models)?

A longer version:

Under some circumstances, an approximately normally distributed random effect makes sense. For example, say we measure individuals' weight ($y$) depending on the type of diet ($x$) they were on, once before and once a month after dieting. If individuals ($\upsilon$) are measured twice, then the following LMM:

$$y_{ij} = \beta_0+ \beta_1 x + \upsilon_i + \epsilon_{ij} \\ \upsilon \sim \mathcal{N}(0,\,\sigma_\upsilon^2), \; \epsilon \sim \mathcal{N}(0,\,\sigma_\epsilon^2)$$

basically assumes that individuals ($\upsilon$) come from some larger population, which causes a random, normally distributed offset in their initial weight. One could argue that whatever (unknown) differences are present between individuals (genetic, environmental, lifestyle), might sum up to a normal distribution as many sums of independent random variables would. In fact, we could use almost the same argument for the errors of the outcome variable ($\epsilon$).

However, say we count birds ($y$) in different terrain types ($x$) on different islands ($\upsilon$) and use a Poisson GLMM, why, if at all, is the normality assumption still defensible? Surely the sum of random variables differing between two islands can cause a normally distributed offset for an outcome with normally distributed errors, but how can we justify this for a non-normal error structure?

I understand that a GLMM models the random effect in the linear part, but isn't this linear part still not assumed to have a normal error structure? (Sorry for the double negative.)


Bonus question:

  • Are there any simple examples of non-normal random effects (e.g. Poisson distributed)?

Best Answer

Some points:

  1. The choice of a normal distribution for the random effects in linear mixed models (i.e., normally distributed) outcome is typically done for mathematical convenience. That is, the normal distribution of $[Y \mid b]$ works nicely with the normal distribution for the random effects $[b]$, and you get a marginal distribution that for the outcome $[Y]$ that is multivariate normal.

  2. In that regard it helps to see a mixed model as a hierarchical Bayesian model. Namely, in the linear mixed model assuming a normal distribution for the random effects is a conjugate prior that gives you back a posterior in closed-form. Hence, you can do the same for other distributions. If you have Binomial outcome data, the conjugate prior for the random effects is a Beta distribution, and you get the Beta-Binomial model. Likewise if you have Poisson outcome data, the conjugate prior for the random effects is a Gamma distribution, and you get the Gamma-Poisson model. Just to make clear here that in the previously mentioned examples, the distribution of the random effects was on the scale of the mean of the outcome conditional on the random effects not on the scale of the linear predictor (e.g., in the Gamma-Poisson example, on the linear predictor scale the assumed distribution would be a log-Gamma distribution).

  3. There is nothing stoping you changing the distribution. For example, in the linear mixed model you could use a Student’s t distribution for the random effects, or in categorical outcomes to use a normal distribution. But then you lose the computational advantage of having a closed-form posterior. There is considerable literature looking into the impact of changing the random-effects distribution. Many people have proposed flexible models for it; for example, using splines or mixtures to be able to capture random-effects distributions that are multi-modal. However, the general consensus has been that the normal distribution works pretty well. Namely, even if you simulate data from a bimodal or skewed distribution for the random effects, and you assume in your mixed model that it is normal, the results (i.e., parameter estimates and standard errors) are almost identical to when you fit a flexible model that captures this distribution more appropriately.

  4. Hence, the choice of the normal distribution has dominated, even though other options do exist. With regard to your point on whether the choice of a normal distribution is defensible for categorical data, as Ben mentioned, note that the distribution of the random effects is placed not on the outcome but rather on the transformed mean of the outcome. For example, for Poisson data you assume a normal distribution for the random effects for $\log(\mu)$ where $\mu$ denotes the expected counts of the outcome variable $Y$ which is the observed counts.

Related Question