Solved – Dealing with heteroscedasticity in mixed models

heteroscedasticitylme4-nlmemixed modelr

I collected crop yield data for many years across multiple locations, which are nested under provinces and some associated weather data. I am interested in making a model and then using new weather data to predict yield across these locations.

Here's a model I fitted

library(lme4)


Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: 
log(yield) ~ x1 + I(x1^2) + x2 +  I(x2^2) + x3 + I(x3^2) +  
             x4 + I(x4^2) + x5 + I(x5^2) +  
             x6 + I(x6^2) + x7 + I(x7^2) +  
             x8 + I(x8^2) + x9 + x10 +  
             I(x10^2) + x11 + I(x11^2) + year +  
            (year | province/location)
Data: dat

 AIC      BIC   logLik deviance df.resid 
-11495.4 -11261.8   5777.7 -11555.4    17735 

Scaled residuals: 
  Min      1Q  Median      3Q     Max 
-4.4276 -0.5241  0.1373  0.6644  4.7144 

Random effects:
Groups            Name        Variance  Std.Dev. Corr 
location:province (Intercept) 0.0028622 0.05350       
                   year        0.0005911 0.02431  0.46 
province          (Intercept) 0.0028255 0.05316       
                   year        0.0003357 0.01832  -0.52
Residual                      0.0296151 0.17209       
Number of obs: 17765, groups:  
location:province, 174; province, 14

Fixed effects:
            Estimate  Std. Error t value
(Intercept) 7.77016483  0.01642658 473.024
x1          0.01490593  0.00286908   5.195
I(x1^2)    -0.00034555  0.00070202  -0.492
x2         -0.00093958  0.00300568  -0.313
I(x2^2)    -0.00008501  0.00056272  -0.151
x3          0.03302874  0.00226069  14.610
I(x3^2)     0.00664297  0.00142503   4.662
x4         -0.04088568  0.00252176 -16.213
I(x4^2)    -0.01398973  0.00139918  -9.998
x5          0.01764166  0.00374300   4.713
I(x5^2)    -0.00298363  0.00135861  -2.196
x6          0.00964780  0.00405505   2.379
I(x6^2)    -0.00359899  0.00123643  -2.911
x7          0.01127387  0.00352026   3.203
I(x7^2)    -0.00175782  0.00052768  -3.331
x8         -0.05147980  0.00392094 -13.129
I(x8^2)     0.00166814  0.00077124   2.163
x9          0.00018234  0.00132609   0.138
x10         0.02124123  0.00385797   5.506
I(x10^2)    0.00174181  0.00087155   1.999
x11        -0.02645305  0.00417074  -6.343
I(x12^2)    0.00045353  0.00092663   0.489
year        0.13482956  0.00617583  21.832

plot(mod)

Since the residuals are obs – fit, I understand that as the fitted value increases, the residuals tend to move from positive to negative which means at a higher value of observed y, the model underpredicts and vice versa.

I would like some guidance on what are the possible reasons for this systematic lines in the plot and how to best deal with them?
enter image description here

Best Answer

The shape of the residuals seems to suggest that you have a lower/upper bound in your yield outcome variable. I.e., are some of the yields exactly zero? In this case, you may consider fitting a two-part mixed effects model for semi-continuous data. This model is, for example, available in the GLMMadaptive package. For an example check here.

To check the fit of the model in this case, it would be better to use the simulated scaled residuals provided by the DHARMa package. For an example that uses DHARMa with models fitted by GLMMadaptive you can check here.

Related Question