Solved – Residual pattern in mixed model

lme4-nlmemixed modelresiduals

EDITED

In my master thesis, I'm investigating how the rating of the participants face recognition skill over a period of time of training. There are 15 sessions in total and each session there are 15 trials. Each trial the participant rate pictures of faces. There are correct and incorrect rating. The correct ratings will be summed together for each session.

I have carried out the following mixed model:

mod = lmer(sumright ~ session * IQc + (session|subject), data=df, na.action=na.omit)
  • the dependent variable: sumright (continous scaled) ranges from 1,2,3… to 15. -> the participant can get max 15 correct answers per session
  • independent variables: session (continous scaled) ranges from 1 to 15 -> there are 15 sessions in total; and
    IQc (continous scaled) is centred -> IQc = IQ – mean(IQ)

There are 30 subjects, and 445 observations (rows of the dataframe).

Now I have plotted the residuals of the model against the fitted values:

enter image description here

We can clearly see that there are gaps between the residuals.
I’m not sure whether there is a voilation of the homogenity assumption.
I also checked whether the residuals are normal and they are. also the residual of the random effects (Intercept and session) are normaly distributed. The only thing is the residual plot, which I’m not sure whether it voilates the homogenity assumption. What do you think of the residual pattern above?

I also tried with a linear regression model. but the gaps between the residuals remain. So, very similar to the mixed model one.

UPDATE:

As suggested in the comment, I run a CLMM and the problem described above is solved. The residualplot no longer has any gaps. So, the dependent variable has to be coded as a ordered factor.

Best Answer

The bands observed are almost certainly because there is a discrete response variable (1 to 15) but it is modelled as a continuous one. (Actually it appears as if there are 12 bands so I suspect there is a lack of people with very low and/or very high scores.) In general "banded" residual plots usually suggest either a (potentially accidental) discretisation or the omission of a grouping factor which has very strong importance.

In the current case, as the response is discrete and ordered I would strongly suggest (at least at first instance) to try ordinal regression; it will naturally encode the information that a participant can only get $k+1$ correct answers if already $k$ correct answers have been attained; that cumulative effect will be encapsulated during the modelling. What we model is the cumulative probability that the response (the number of correct answers in this setting) is not greater than a chosen category ($k$ correct answers). A nice introduction on ordinal regression can be found here. Specific to mixed-effects application the R package ordinal allows for mixed-effect models to be fitted through its function clmm.