Solved – Interpreting three forms of a “mixed model”

lme4-nlmemixed modelr

There's a distinction that's tripping me up with mixed models, and I'm wondering if I could get some clarity on it. Let's assume you've got a mixed model of count data. There's a variable you know you want as a fixed effect (A) and another variable for time (T), grouped by say a "Site" variable.

As I understand it:

glmer(counts ~ A + T, data=data, family="Poisson") is a fixed effects model.

glmer(counts ~ (A + T | Site), data=data, family="Poisson") is a random effect model.

My question is when you have something like:

glmer(counts ~ A + T + (T | Site), data=data, family="Poisson") what is T? Is it a random effect? A fixed effect? What's actually being accomplished by putting T in both places?

When should something only appear in the random effects section of the model formula?

Best Answer

This may become clearer by writing out the model formula for each of these three models. Let $Y_{ij}$ be the observation for person $i$ in site $j$ in each model and define $A_{ij}, T_{ij}$ analogously to refer to the variables in your model.

glmer(counts ~ A + T, data=data, family="Poisson") is the model

$$ \log \big( E(Y_{ij}) \big) = \beta_0 + \beta_1 A_{ij} + \beta_2 T_{ij} $$

which is just an ordinary poisson regression model.

glmer(counts ~ (A + T|Site), data=data, family="Poisson") is the model

$$ \log \big( E(Y_{ij}) \big) = \alpha_0 + \eta_{j0} + \eta_{j1} A_{ij} + \eta_{j2} T_{ij} $$

where $\eta_{j} = (\eta_{j0}, \eta_{j1}, \eta_{j2}) \sim N(0, \Sigma)$ are random effects that are shared by each observation made by individuals from site $j$. These random effects are allowed to be freely correlated (i.e. no restrictions are made on $\Sigma$) in the model you specified. To impose independence, you have to place them inside different brackets, e.g. (A-1|Site) + (T-1|Site) + (1|Site) would do it. This model assumes that $\log \big( E(Y_{ij}) \big)$ is $\alpha_0$ for all sites but each site has a random offset ($\eta_{j0}$) and has a random linear relationship with both $A_{ij}, T_{ij}$.

glmer(counts ~ A + T + (T|Site), data=data, family="Poisson") is the model

$$ \log \big( E(Y_{ij}) \big) = (\theta_0 + \gamma_{j0}) + \theta_1 A_{ij} + (\theta_2 + \gamma_{j1}) T_{ij} $$

So now $\log \big( E(Y_{ij}) \big)$ has some "average" relationship with $A_{ij}, T_{ij}$, given by the fixed effects $\theta_0, \theta_1, \theta_2$ but that relationship is different for each site and those differences are captured by the random effects, $\gamma_{j0}, \gamma_{j1}, \gamma_{j2}$. That is, the baseline is random shifted and the slopes of the two variables are randomly shifted and everyone from the same site shares the same random shift.

what is T? Is it a random effect? A fixed effect? What's actually being accomplished by putting T in both places?

$T$ is one of your covariates. It is not a random effect - Site is a random effect. There is a fixed effect of $T$ that is different depending on the random effect conferred by Site - $\gamma_{j1}$ in the model above. What is accomplished by including this random effect is to allow for heterogeneity between sites in the relationship between $T$ and $\log \big( E(Y_{ij}) \big)$.

When should something only appear in the random effects section of the model formula?

This is a matter of what makes sense in the context of the application.

Regarding the intercept - you should keep the fixed intercept in there for a lot of reasons (see, e.g., here); re: the random intercept, $\gamma_{j0}$, this primarily acts to induce correlation between observations made at the same site. If it doesn't make sense for such correlation to exist, then the random effect should be excluded.

Regarding the random slopes, a model with only random slopes and no fixed slopes reflects a belief that, for each site, there is some relationship between $\log \big( E(Y_{ij}) \big)$ and your covariates for each site, but if you average those effects over all sites, then there is no relationship. For example, if you had a random slope in $T$ but no fixed slope, this would be like saying that time, on average, has no effect (e.g. no secular trends in the data) but each Site is heading in a random direction over time, which could make sense. Again, it depends on the application.

Note that you can fit the model with and without random effects to see if this is happening - you should see no effect in the fixed model but significant random effects in the subsequent model. I must caution you that decisions like this are often better made based on an understanding of the application rather than through model selection.

Related Question