Linear Mixed Effects Models are Extensions of Linear Regression models for data that are collected and summarized in groups. The key advantages is the coefficients can vary with respect to one or more group variables.
However, I am struggling with when to use mixed effect model? I will elaborate my questions by using a toy example with extreme cases.
Let's assume we want to model height and weight for animals and we use species as grouping variable.
-
If different group / species are really different. Say a dog and elephant. I think there is no point of using mixed effect model, we should build a model for each group.
-
If different group / species are really similar. Say a female dog and a male dog. I think we may want use gender as a categorical variable in the model.
So, I assume we should use mixed effect model in the middle cases? Say, the group are cat, dog, rabbit, they are similar sized animals but different.
Is there any formal argument to suggest when to use mixed effect model, i.e., how to draw lines among
- Building models for each group
- Mixed effect model
- Use group as a categorical variable in regression
My attempt: Method 1 is the most "complex model" / less degree of freedom and method 3 is the most "simple model" / more degree of freedom. And Mixed effect model is in the middle. We may consider how much data and how complicated data we have to select the right model according to Bais Variance Trade Off.
Best Answer
I'm afraid I might have the nuanced and perhaps unsatisfying answer that it is a subjective choice by the researcher or data analyst. As mentioned elsewhere in this thread, it isn't enough to simply say the data have a "nested structure." To be fair, though, this is how many books describe when to use multilevel models. For example, I just pulled Joop Hox's book Multilevel Analysis off of my bookshelf, which gives this definition:
Even in a pretty good textbook, the initial definition seems to be circular. I think this is partially due to the subjectivity of determining when to use what kind of model (including a multilevel model).
Another book, West, Welch, & Galecki's Linear Mixed Models says these models are for:
Finch, Bolin, & Kelley's Multilevel Modeling in R also talks about violating the iid assumption and correlated residuals:
I believe that a multilevel model makes sense when there is reason to believe that observations are not necessarily independent of one another. Whatever "cluster" accounts for this non-independence can be modeled.
An obvious example would be children in classrooms—they are all interacting with one another, which might lead their test scores to be non-independent. What if one classroom has someone that asks a question that leads to material being covered in that class that isn't covered in other classes? What if the teacher is more awake for some classes than others? In this case, there would be some non-independence of data; in multilevel words, we could expect some variance in the dependent variable to be due to the cluster (i.e., class).
Your example of a dog versus an elephant depends on the independent and dependent variables of interest, I think. For example, let's say we are asking if there is an effect of caffeine on activity level. Animals from all over the zoo are randomly assigned to either get a caffeinated drink or a control drink.
If we are a researcher that is interested in caffeine, we might specify a multilevel model, because we really care about the effect of caffeine. This model would be specified as:
This is particularly helpful if there are a large number of species we are testing this hypothesis over. However, a researcher might be interested in the species-specific effects of caffeine. In that case, they could specify species as a fixed effect:
This obviously is a problem if there are, say, 30 species, creating an unwieldy 2 x 30 design. However, you can get pretty creative with how one models these relationships.
For example, some researchers are arguing for an even wider use of multilevel modeling. Gelman, Hill, & Yajima (2012) argue that multilevel modeling could be used as a correction for multiple comparisons—even in experimental research where the structure of the data is not obviously hierarchical in nature:
Problems can be modeled in various ways, and in ambiguous cases, multiple approaches might seem appealing. I think our job is to choose a reasonable, informed approach and do so transparently.