Solved – AIC, anova error: Models are not all fitted to the same number of observations, models were not all fitted to the same size of dataset

aicmixed modelr

I have models like this:

require(nlme)

set.seed(123)
n <- 100
k <- 5
cat <- as.factor(rep(1:k, n))
cat_i <- 1:k # intercept per kategorie
x <- rep(1:n, each = k)
sigma <- 0.2
alpha <- 0.001
y <- cat_i[cat] + alpha * x + rnorm(n*k, 0, sigma)
plot(x, y)

m1 <- lm(y ~ x)
summary(m1)

m2 <- lm(y ~ cat + x)
summary(m2)

m3 <- lme(y ~ x, random = ~ 1|cat, na.action = na.omit)
summary(m3)

Now I am trying to assess whether the random effect should be present in the model. So I compare the models using AIC or anova, and I get the following error:

> AIC(m1, m2, m3)
   df       AIC
m1  3 1771.4696
m2  7 -209.1825
m3  4 -154.0245
Warning message:
In AIC.default(m1, m2, m3) :
  models are not all fitted to the same number of observations  
> anova(m2, m3)
Error in anova.lmlist(object, ...) : 
  models were not all fitted to the same size of dataset

As you can see, in both cases I use the same dataset. I have found two remedies, but I don't consider them satisfying:

  1. Adding method = "ML" to the lme() call – not sure if it is good idea to change the method.
  2. Using lmer() instead. Surprisingly, this works, despite the fact that lmer() uses REML method. However I dont like this solution because the lmer() doesn't show p-values for coefficients – I like to use older lme() instead.

Do you have any idea if this is a bug or not and how can we go around that?

Best Answer

A quick search shows that it is possible (although I have to admit that I thought it wasn't) and that it isn't a bug...just another case where methods in R are hidden and result in things that seem 'unexpected', but the RTFM crowd say, "It is in the documentation." Anyway...your solution is to do anova with the lme as the first argument and the lm models as the second (and third if you like) argument(s). If this seems odd, it is because it is a little odd. The reason is that when you call anova, the anova.lme method is called only if the first argument is an lme object. Otherwise, it calls anova.lm (which in turn calls anova.lmlist; if you dig into anova.lm, you'll see why). For details about how you want to be calling anova in this case, pull up help for anova.lme. You'll see that you can compare other models with lme models, but they have to be in a position other than the first argument. Apparently it is also possible to use anova on models fit using the gls function without caring too much about the order of the model arguments. But I don't know enough of the details to determine whether that is a good idea or not, or what exactly it implies (it seems probably fine, but your call). From that link comparing lm to lme appears to be well documented and cited as a method, so I'd err in that direction, were I you.

Good luck.