Solved – Linear mixed model with skewed residuals

mixed modelresiduals

I'd like to analyse data of 11 plots with 15 plant individuals on each plot.

A variable was measured on each plant in 9 different years.

So 11 plots with 15 plants on each plot and measurements in 9 years = 1485 observations.

  year   id plot  val
1 2000 A_01    A 0.70
2 2000 A_02    A 0.90
3 2000 A_03    A 0.79
4 2000 A_04    A 1.04
5 2000 A_05    A 0.84
6 2000 A_06    A 0.84
...
     year   id plot  val
1480 2008 N_10    N 0.35
1481 2008 N_11    N 0.72
1482 2008 N_12    N 0.36
1483 2008 N_13    N 0.20
1484 2008 N_14    N 0.41
1485 2008 N_15    N 0.51

My goal is to find differences on each plot between years as well as differences in each year between the plots.

Thus my analysis in R looks like this so far:

library(lme4)
mod <- lmer(val ~ year * plot + (1|id), data = dat)

> summary(mod)
Linear mixed model fit by REML ['lmerMod']
Formula: val ~ year * plot + (1 | id)
   Data: dat

REML criterion at convergence: 299.6

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.0167 -0.5385 -0.0967  0.4548 14.6926 

Random effects:
 Groups   Name        Variance  Std.Dev. 
 id       (Intercept) 5.596e-17 7.481e-09
 Residual             5.992e-02 2.448e-01

As I want to calculate p-values to find significant differences between years/plots I think I need to make sure I have normally distributed residuals of the random effects (?).

So I'm looking at the residuals ..

mod.res <- ranef(mod)$id$`(Intercept)`

.. and find that they are right-skewed:

So I found that there is the package robustlmm that – I think – can take care of that.

rmod <- rlmer(val ~ year * plot + (1|id), data = dat)

However, now the variance of the random effect is 0

Robust linear mixed model fit by DAStau 
Formula: val ~ year * plot + (1 | id) 
   Data: dat 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.5596 -0.6193 -0.0762  0.6120 19.8078 

Random effects:
 Groups   Name        Variance Std.Dev.
 id       (Intercept) 0.00000  0.000   
 Residual             0.03766  0.194

I am stuck now. What does that mean? How can I proceed?

Best Answer

These models are both telling you that there is no variation between plants. The second model estimates the random intercept variance as zero, while the lmer model estimates it as 5.596e-17, which is $0.000000000000000005596$

So you don't need to fit random intercepts to this data. You can simply use lm instead.

As for the residuals, these are not particularly skewed. There are 2, possibly 3, outliers and I would not be overly concerned about this. These residuals are plausibly normally distributed. But to be on the "safe side", you could use robust standard errors - see the sandwich package for more details.

Related Solutions

Solved – Mixed effects model or mixed design ANOVA in R

So I've done a lot of reading and chatting to people and I have a solution.

My experimental design is a split plot design, which is quite different from a nested or hierarchical design. I was originally confusing the terms. As Robert correctly states in his answer, what is needed is a mixed effects model. Thus:

Fixed effects: Year, Treatment1, Treatment2

Random effects: Year, Block, Treatment1

The model is specified thus:

mod<- lmer(Richness~Treatment1*Treatment2*Year+(1|Block/Treatment1)+(1|Year),data=dat,poisson)

The fixed effects are the terms specified in the brackets. Since none of these are continuous (the effect of Year doesn't necessarily increase each year in a linear fashion so I have classed it as a categorical fixed effect), they are specified 1|fixed effect, where 1 represents the intercept.

If Block were actually a continuous fixed effect (obviously hypothetical!) then the fixed effects might be specified +(Block|Treatment1)+(1|Year).

The model can then be simplified as appropriate.

Several things to note:

1) When specified as a random effect, Year is listed separately from Block and Treatment1, since it doesn't have an intuitive "level" at which to be nested between them (Year isn't any different at any plot size of the experiment: for every block, plot and subplot Year is the same.

2) Treatment 2 does not need to be specified as a random effect since it represents the highest level of replication in the experiment and therefore will not be psuedoreplicated

3) In mixed effects models it is possible to specify an error distribution if errors are not normal. I have specified poisson here, since my response data are counts - this improved the distribution of the model residuals.

Mixed-Model Analysis – Understanding Random Effects with Zero Variance

You say that each individual had at least two measurements, but from your output there are only 66 observations on 30 individuals, so only six individuals (at most) had more than two measurements. Two is the absolute minimum you need to calculate a mean and a standard distribution -- the random intercept is assumed to be a Normal distribution -- which will have a LOT of uncertainty. Looking at the plot, you have at least five individuals with essentially zero variance, and at least five individuals with a HUGE variance (probably caused by only two observations each).

I'd say you have too little data that's too noisy. The "clear" differences you see are mostly illusory because of the lack of data resulting in huge swings.

Best Answer

Related Solutions

Solved – Mixed effects model or mixed design ANOVA in R

Mixed-Model Analysis – Understanding Random Effects with Zero Variance

Related Question