Solved – Estimating correlation between random slopes and random intercepts using the lme4 package in R

lme4-nlmemixed modelrrandom-effects-modelrepeated measures

For answering my research question I am interested in the correlation between the random slopes and random intercepts in a multilevel model, estimated using the R library lme4.

The data I have is: Y (test-scores from students), SES (socio-economic status for each student) and schoolid (ID for each school).

I am using the following syntax to estimate random intercepts and slopes for the schools:

library(lme4)
model3 <- lmer(Y ~ SES + (1 + SES | schoolid))

The reference I used for this syntax is this pdf:

http://www.bristol.ac.uk/cmm/learning/module-samples/5-concepts-sample.pdf

On page 19, a similar analysis is described. It is said that by defining the random intercepts and slopes toghether, it is indirectly specified that we want the random intercepts and slopes to covary. Therefore, also the correlation between random slopes and random intercepts is estimated. Basically, exactly what I need for answering my research hyptohesis.

However, when I look at the results:

 summary(model3)

I am getting the following output:

Linear mixed model fit by REML ['lmerMod']
Formula: Y ~ SES + (1 + SES | schoolid)

REML criterion at convergence: 8256.4

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.1054 -0.6633 -0.0028  0.6810  3.5606 

Random effects:
 Groups   Name        Variance  Std.Dev. Corr
 schoolid (Intercept) 0.6427924 0.80174      
      SES             0.0009143 0.03024  1.00
 Residual             0.3290902 0.57366      
Number of obs: 4376, groups: schoolid, 179

Fixed effects:
             Estimate Std. Error t value
(Intercept) -0.036532   0.060582  -0.603
SES          0.062491   0.009984   6.259

Correlation of Fixed Effects:
    (Intr)
SES 0.226 

As stated in the output, the correlation between the random slopes and random intercepts equals 1.00. I find this hard to believe.
When I call in R:

VarCorr(model3)$schoolid

I am getting the following output which gives the correlations and covariance matrix:

                (Intercept)          SES
(Intercept)  0.64279243 0.0242429680
SES          0.02424297 0.0009143255

attr(,"stddev")
(Intercept)         SES 
 0.80174337  0.03023782 

attr(,"correlation")
        (Intercept) SES
(Intercept)           1   1
SES                   1   1

It seems as if the correlation between the slopes and intercepts was set to 1.00 by R. I did not see this in the output of anyone else when I was searching the internet on references on multilevel modelling.

Does anybody know what can be the cause of this correlation?
Can it be that the correlation is set to 1.00 because otherwise the model would not be identified?
Or is it because the variance of the random slopes is so small (0.0009) that the correlation can not be estimated?

I have tried to simulate data in order provide the code for a small reproducible dataset. I was however not yet able to reproduce this output by means of simulated data. As far as I have the code I will eidt my post and add the code.

Edit:
In response to a comment by Roman Luštrik, the following plot:

ggplot(data[1:261,], aes(x = SES, y = Y)) + geom_point() + facet_wrap(~ schoolid) +
   geom_smooth(method=lm)

As there are in total 179 schools the plot becomes quite chaotic, therefore I included the first 10 schools only to make it readable:

enter image description here

Best Answer

Here is the answer by @ben-bolker mentioned in the comments, which links to the R-sig-ME mailing list, posted here for completeness:

Yes, you don't have enough information (or equivalently there is too much noise) in your data to uniquely identify the full variance-covariance matrix, so the results are 'singular'; that is, one of the underlying components is identified as zero. Common symptoms of this situation include either variances equal to zero (or nearly equal, although I see in your case that the variances while small are not exactly zero) or correlations equal to +/- 1.

http://rpubs.com/bbolker/4187 shows some simulated examples where the estimated variance collapses to zero (despite the true, simulated model having a non-zero among-group variance). Presumably one could make up similar examples (with randomized-block designs/random-slope models) that would show analogous situations of correlations collapsing to +/- 1.

It's a little bit surprising that you have encountered this problem since it is most commonly a problem with small numbers of levels in the grouping variables, which doesn't appear to be the case here.

See here for more information ...

Related Question