Solved – Residual Diagnostics and Homogeneity of variances in linear mixed model

diagnosticheteroscedasticitymixed modelresiduals

Before asking this question, I did search our site and found a lot of similar questions, (like here, here, and here). But I feel those related questions were not well responded or discussed, thus would like to raise this question again. I feel there should be a great amount of audience who wish these kinds of questions be explained more clearly.

For my questions, first consider the linear mixed-effects model, $$
\mathbf{y = X\boldsymbol \beta + Z \boldsymbol \gamma + \boldsymbol \epsilon}
$$
where $X\boldsymbol \beta$ is the linear fixed-effects component, $\mathbf{Z}$ is the additional design matrix corresponding the the random-effects parameters, $\boldsymbol \gamma$. And $\boldsymbol \epsilon \ \sim \ N(\mathbf{0, \sigma^2 I})$ is the usual error term.

Let's assume the only fixed-effect factor be a categorical variable Treatment, with 3 different levels. And the only random-effect factor is the variable Subject. That said, we have a mixed-effect model with fixed treatment effect and random subject effect.

My questions are thus are:

  1. Is there the homogeneity of variance assumption in linear mixed model setting, analogous to traditional linear regression models? If so, what does the assumption specifically mean in the context of the linear mixed model problem stated above? What are other important assumptions that need to be assessed?

My thoughts: YES. the assumptions (I mean, zero error mean, and equal variance) are still from here: $\boldsymbol \epsilon \ \sim \ N(\mathbf{0, \sigma^2 I})$. In traditional linear regression model setting, we can say that the assumption is that "the variance of the errors (or just the variance of the dependent variable) is constant across all 3 treatment levels". But I am lost how we can explain this assumption under the mixed model setting. Should we say "the variances is constant across 3 levels of treatments, conditioning on subjects? or not?"

  1. The SAS online document about the residuals and influence diagnostics brought up two different residuals, i.e, the Marginal residuals, $$
    \mathbf{r_m = Y – X \hat{\boldsymbol \beta}}
    $$ and the Conditional residuals, $$
    \mathbf{r_c = Y – X \hat{\boldsymbol \beta} – Z \hat{\boldsymbol \gamma} = r_m – Z \hat{\boldsymbol \gamma}}
    .$$
    My question is, what are the two residuals are used for? How we could use them to check the homogeneity assumption? To me, only the marginal residuals can be used to address the homogeneity issue, since it corresponds to the $\boldsymbol \epsilon$ of the model. Is my understanding here correct?

  2. Are there any tests proposed to test the homogeneity assumption under linear mixed model? @Kam pointed out the levene's test previously, would this be the right way? If not, what are the directions? I think after we fit the mixed model, we can get the residuals, and maybe can do some tests (like goodness-of-fit test?), but not sure how it would be.

  3. I also noticed that there are three types of residuals from Proc Mixed in SAS, namely, the Raw residual, the Studentized residual, and the Pearson residual. I can understand the differences among them in terms of formulas. But to me they seem to be very similar when it comes to real data plots. So how should they be used in practice? Are there situations where one type is preferred to the others?

  4. For a real data example, the following two residual plots are from Proc Mixed in SAS. How would the assumption of the Homogeneity of variances can be addressed by them?

[I know I got a couple of questions here. If you could provide me any of your thoughts to any question, that's great. No need to address all of them if you cannot. I really wish to discuss about them to get full understanding. Thanks!]

Here are the marginal (raw) residual plots.
Here are the marginal (raw) residual plots.

Here are the conditional (raw) residual plots.
Here are the conditional (raw) residual plots.

Best Answer

I think Question 1 and 2 are interconnected. First, the homogeneity of variance assumption comes from here, $\boldsymbol \epsilon \ \sim \ N(\mathbf{0, \sigma^2 I})$. But this assumption can be relaxed to more general variance structures, in which the homogeneity assumption is not necessary. That means it really depends on how the distribution of $\boldsymbol \epsilon$ is assumed.

Second, the conditional residuals are used to check the distribution of (thus any assumptions related to) $\boldsymbol \epsilon$, whereas the marginal residuals can be used to check the total variance structure.