Using maximum likelihood, any of these can be compared with AIC; if the fixed effects are the same (m1
to m4
), using either REML or ML is fine, with REML usually preferred, but if they are different, only ML can be used. However, interpretation is usually difficult when both fixed effects and random effects are changing, so in practice, most recommend changing only one or the other at a time.
Using the likelihood ratio test is possible but messy because the usual chi-squared approximation doesn't hold when testing if a variance component is zero. See Aniko's answer for details. (Kudos to Aniko for both reading the question more carefully than I did and reading my original answer carefully enough to notice that it missed this point. Thanks!)
Pinhiero/Bates is the classic reference; it describes the nlme
package, but the theory is the same. Well, mostly the same; Doug Bates has changed his recommendations on inference since writing that book and the new recommendations are reflected in the lme4
package. But that's more than I want to get into here. A more readable reference is Weiss (2005), Modeling Longitudinal Data.
Fixed effects models and random effects models ask different questions of the data. Specifying a set of group-level dummy variables essentially controls for all group-level unobserved heterogeneity in the average response, leaving your estimates to reflect only variability within units. Random effects models start with the assumption that there is a meta-population of (whatever effect), and that your sample reflects many draws from that population. So rather than anchoring your results around heterogeneous intercepts, your data will be used to elucidate the parameters of that (usually normal) distribution from which your data were supposedly drawn.
It is often said that fixed effects models are good for conducting inference on the data that you have, and that random effects models are good for trying to conduct inference on some larger population from which your data is a random sample.
When I learned about fixed effects models, they were motivated using error components and panel data. Take multiple observations of a given unit, and a random treatment in time $t$.
$$y_{it} = \alpha_i + \beta T_{it} + \epsilon_{it}$$
You can break your error term out into that component of your error term that varies in time, and one that doesn't:
$$y_{it} = \alpha_i + \beta T_{it} + e_i + u_{it}$$
Now subtract the groupwise mean from both sides:
$$y_{it} - \bar y_i = \alpha_i - \bar \alpha_i + \beta \left(T_{it}- \bar T_i\right) + e_i - \bar e_i+ u_{it}- \bar u_it$$
Things that aren't subscripted by $t$ come out of the equation by basic subtraction -- which is to say that the average over time is the same as it is at any time if it never changes. This includes your non-time-varying component of your error term. Thus your estimates are unconfounded by time-invariant heterogeneity.
This doesn't quite work for a random effects model -- your non-$t$-indexed variables won't be sopped up by that transformation (the "within" transformation). As such, you can draw inference on the effects of things that don't vary within group. In the real world, such things have importance. Thus, random effects are good for "modeling the data", while fixed effects models are good for getting closer to unbiased estimates of particular terms. With a random effects model, you can't make the claim to have removed that $e_i$
entirely.
In this example, time is the grouping variable. In your example, it is DID. (i.e.: it generalizes)
Best Answer
One possible approach was suggested by Nakagawa and Schielzeth (2013) where $R^2$ is decomposed into variance explained by fixed effects ($R^2_m$) and random effects ($R^2_c$).
Let's say your model looks like the one below, where $\alpha$ and $\gamma$ are random effects:
$$y_{ijk} = \beta_0 + \sum^M_{m=1} \beta_m X_{mijk} + \alpha_j + \gamma_k + \epsilon_{ijk}$$
if so, you can define $R^2$ for fixed effects as:
$$R^2_m = \frac{\sigma^2_f}{\sigma^2_f + \sigma^2_\alpha + \sigma^2_\gamma + \sigma^2_\epsilon}$$
where:
$$\sigma^2_f = var\left( \sum^M_{m=1} \beta_m X_{mijk} \right)$$
And $R^2$ for random effects as:
$$R^2_c = \frac{\sigma^2_f + \sigma^2_\alpha + \sigma^2_\gamma}{\sigma^2_f + \sigma^2_\alpha + \sigma^2_\gamma + \sigma^2_\epsilon}$$
And this approach could be extended to GLMM's by decomposing $\sigma^2_\epsilon$ into (Nakagawa and Schielzeth, 2013, p. 137):
Johnson (2014) suggested how this approach could be used for random slopes models.
Functions for estimating $R^2_m$ and $R^2_c$ can be found in MuMIn package using
r.squaredGLMM
function.