I collected crop yield data for many years across multiple locations, which are nested under provinces and some associated weather data. I am interested in making a model and then using new weather data to predict yield across these locations.
Here's a model I fitted
library(lme4)
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula:
log(yield) ~ x1 + I(x1^2) + x2 + I(x2^2) + x3 + I(x3^2) +
x4 + I(x4^2) + x5 + I(x5^2) +
x6 + I(x6^2) + x7 + I(x7^2) +
x8 + I(x8^2) + x9 + x10 +
I(x10^2) + x11 + I(x11^2) + year +
(year | province/location)
Data: dat
AIC BIC logLik deviance df.resid
-11495.4 -11261.8 5777.7 -11555.4 17735
Scaled residuals:
Min 1Q Median 3Q Max
-4.4276 -0.5241 0.1373 0.6644 4.7144
Random effects:
Groups Name Variance Std.Dev. Corr
location:province (Intercept) 0.0028622 0.05350
year 0.0005911 0.02431 0.46
province (Intercept) 0.0028255 0.05316
year 0.0003357 0.01832 -0.52
Residual 0.0296151 0.17209
Number of obs: 17765, groups:
location:province, 174; province, 14
Fixed effects:
Estimate Std. Error t value
(Intercept) 7.77016483 0.01642658 473.024
x1 0.01490593 0.00286908 5.195
I(x1^2) -0.00034555 0.00070202 -0.492
x2 -0.00093958 0.00300568 -0.313
I(x2^2) -0.00008501 0.00056272 -0.151
x3 0.03302874 0.00226069 14.610
I(x3^2) 0.00664297 0.00142503 4.662
x4 -0.04088568 0.00252176 -16.213
I(x4^2) -0.01398973 0.00139918 -9.998
x5 0.01764166 0.00374300 4.713
I(x5^2) -0.00298363 0.00135861 -2.196
x6 0.00964780 0.00405505 2.379
I(x6^2) -0.00359899 0.00123643 -2.911
x7 0.01127387 0.00352026 3.203
I(x7^2) -0.00175782 0.00052768 -3.331
x8 -0.05147980 0.00392094 -13.129
I(x8^2) 0.00166814 0.00077124 2.163
x9 0.00018234 0.00132609 0.138
x10 0.02124123 0.00385797 5.506
I(x10^2) 0.00174181 0.00087155 1.999
x11 -0.02645305 0.00417074 -6.343
I(x12^2) 0.00045353 0.00092663 0.489
year 0.13482956 0.00617583 21.832
plot(mod)
Since the residuals are obs – fit, I understand that as the fitted value increases, the residuals tend to move from positive to negative which means at a higher value of observed y
, the model underpredicts and vice versa.
I would like some guidance on what are the possible reasons for this systematic lines in the plot and how to best deal with them?
Best Answer
The shape of the residuals seems to suggest that you have a lower/upper bound in your
yield
outcome variable. I.e., are some of the yields exactly zero? In this case, you may consider fitting a two-part mixed effects model for semi-continuous data. This model is, for example, available in the GLMMadaptive package. For an example check here.To check the fit of the model in this case, it would be better to use the simulated scaled residuals provided by the DHARMa package. For an example that uses DHARMa with models fitted by GLMMadaptive you can check here.