Given three variables, y
and x
, which are positive continuous, and z
, which is categorical, I have two candidate models given by:
fit.me <- lmer( y ~ 1 + x + ( 1 + x | factor(z) ) )
and
fit.fe <- lm( y ~ 1 + x )
I hope to compare these models to determine which model is more appropriate. It seems to me that in some sense fit.fe
is nested within fit.me
. Typically, when this general scenario holds, a chi-squared test can be performed. In R
, we can perform this test with the following command,
anova(fit.fe,fit.me)
When both models contain random-effects (generated by lmer
from the lme4
package), the anova()
command works fine. Owing to boundary parameters, it is normally advisable to test the resulting Chi-Square statistic via simulation, nonetheless, we can still use the statistic in the simulation procedure.
When both models contain only fixed-effects, this approach—and, the associated anova()
command—work fine.
However, when one model contains random effects and the reduced model contains only fixed-effects, as in the above scenario, the anova()
command doesn't work.
More specifically, I get the following error:
> anova(fit.fe, fit.me)
Error: $ operator not defined for this S4 class
Is there anything wrong with using the Chi-Square approach from above (with simulation)? Or is this simply a problem of anova()
not knowing how to deal with linear models generated by different functions?
In other words, would it be appropriate to manually generate the Chi-Square statistic derived from the models? If so, what are the appropriate degrees of freedom for comparing these models? By my reckoning:
$$ F = \frac{\left((SSE_{reduced}-SSE_{full})/(p-k)\right)}{\left((SSE_{full})/(n-p-1)\right)} \sim F_{p-k,n-p-1} $$
We are estimating two parameters in the fixed effects model (slope and intercept) and two more parameters (variance parameters for the random slope and random intercept) in the mixed-effects model. Typically, the intercept parameter isn't counted in the degrees of freedom computation, so that implies that $k=1$ and $p=k+2=3$; having said that I'm not sure if the variance parameters for the random-effects parameters should be included in the degrees of freedom computation; the variance estimates for fixed-effect parameters are not considered, but I believe that to be because the parameter estimates for fixed effects are assumed to be unknown constants whilst they are considered to be unknowable random variables for mixed effects. I would appreciate some assistance on this issue.
Finally, does anybody have a more appropriate (R
-based) solution to comparing these models?
Best Answer
Technically, you can get it working by just switching the order of the parameters:
Will work just fine. If you pass an object generated by
lmer
first, theanova.merMod
will be called instead ofanova.lm
(which does not know how to handlelmer
objects). See:Although, choosing a mixed model or a fixed model is a modeling choice which needs to take into account the experimental design, not a model selection problem. See @BenBolker's https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#testing-significance-of-random-effects for more details: