Solved – Are $R^2$ for GLMM useful for modelers but not necessarily for readers

generalized linear modelglmmlme4-nlmemixed modelvariance

The short version:

1)Are there any published critiques of the use of $R^2$ for GLMMs, in particular the popular approach of Nakagawa & Schielzeth (2013) A general and simple method for obtaining $R^2$from generalized linear mixed-effects models

2)Will most readers benefit from $R^2$ being reported for GLMMs, or do the complexities of this statistic make them not very useful?

The long version:
Reporting $R^2$ for generalized linear mixed models (R2.GLMM) using the methods of Nakagawa & Schielzeth (2013) has become very popular (cited 1774 as of spring 2017). They present a method for calculating an $R^2$associated with fixed effects ($R^2$ marginal to the random effects) and $R^2$for the overall model ($R^2$ conditioning on the random effects).

There is significant discussion, however of the complications in calculating $R^2$for GLMMs and the contradictions that arise when defining it, especially from lme4 developer Doug Bates. (See here for a quote from Bates pasted into a CV response, and here for a response on a listserve. These issues are also reviewed at the GLMM wiki). Despite this general discussion, I don't know of any particular critiques of Nakagawa & Schielzeth (2013).

A reviewer has requested that I report marginal $R^2$ values for some hierarchical models. I am of the opinion, however, that $R^2$ is mostly a useful diagnostic to look at while modeling, but can have limited value to readers — or worse, might even be misleading. In particular I am reluctant to report $R^2$ for my GLMMs because the values for the fixed effects marginal to the random effects (R2.M) are small (<0.10) due to the large amounts of variation between levels of my random effects (see below for the definition of R2.M). By all other measures, I am happy with the model and I don't think $R^2$ will provide readers with relevant information; the data will be published is anyone wants to calculate it themselves.

I think most reader of my paper will have a standard interpretation of $R^2$ ("goodness of fit") when viewing these numbers and not appreciate that the structure of my random effects is a major reason why $R^2$ is low, and I worry that my results will be discounted due these small $R^2$. (The tension between $R^2$ and other measures of model "significance" is mentioned briefly by Nakagawa & Schielzeth (2013) but not delved into; see below for an example from their paper; A somewhat similar situation is described on CV here, though in this case $R^2$is low and parameters are not significant).

As I see it, a low marginal $R^2$for the fixed effect and a high conditional $R^2$for the overall model just tells me that a lot of things varied between the times, individuals, and places represented in my model. Similarly, if it were an experiment with blocking, low marginal $R^2$and high conditional $R^2$would tell me that blocking was a good idea and did its job.

My question therefore is 1) Are there any published critiques of Nakagawa & Schielzeth (2013) or similar approaches and 2)what is most principled approach for responding to the reviewer.

For responding to the review I see my options as follows (not mutually exclusive):

  1. Add to the paper the $R^2$ as requested, but then explain how they should be interpreted in light of the amount of variance in my the random effects.
  2. Add to the paper AIC from an intercept-only model as a measure of fit. Discuss (at least to the editor) that AIC indicates that the model is improved by the inclusion of fixed effects despite the low $R^2$.
  3. Make the argument to the editor/reviewer that my current approach (p-values, CIs, effect sizes, plots of model predictions vs. raw data) is sufficient and provide them with links to commentary by Bates and others about the limits of R2.GLMM, as well as the general issues with $R^2$ (ie, here on CV).

Below I define $R^2$for GLMMs more precisely and show a worked example of low marginal $R^2$ in the original Nakagawa & Schielzeth (2013).

Example: low $R^2$ in a binomial model from Nakagawa & Schielzeth (2013)

Marginal $R^2$is defined as

R2.GLMM.M = (var.f)/(var.f + var.random.effects + var.residual + var.dist)

where var.f is the variance of predictions from the fitted model and var.dist is a term specific to the distribution of different GLM.

Conditional $R^2$(R2.GLMM.C) is similarly defined and just adds the variance of the random effects to the denominator to get the overall proportion of variance explained by the model.

N&S analyze simulated binomial and Poisson data to examine the effects of Habitat and a hypothetical Treatment on the biology of beetles; they report relatively low R2.M's 0.077 and 0.0976, respectively (results reproduced below). They then note:

"R2.GLMM.M values [variance explained by fixed effects] are relatively minor (8-10%) compared to R2.GLMM.C values …[However] it is important to note that both Treatment and Habitat [fixed] effects were statistically significant… Much of the data variability, however, resided in the random effects along with the residuals…. Note that differences between R2.GLMM.M and R2.GLMM.C values reflect how much variability is in the random effects. Importantly, comparing the different variance components including that of the fixed factors within as well as between models, we believe could help researcher gaining extra insight into there datasets (page 140)"

I agree with them that researchers could gain "extra insight into there datasets" but am not sure that readers will appreciate the complexities of that result in "Much of the data variability…resided in the random effects."

library(lme4); library(rptR); library(piecewiseSEM)
data(BeetlesMale)

#Null model
m0 <- glmer(Colour ~ 1 + (1|Population) + (1|Container), 
            family = "binomial", data = BeetlesMale)
#Full model
mF <- update(m0, . ~ . + Treatment + Habitat)

#R2.GLMM
sem.model.fits(list(m0,mF))[,c(5:8)]

  Marginal Conditional AIC dAIC
1   0.0000       0.223 602 29.2
2   0.0777       0.311 573  0.0

Best Answer

Posting an edited version of my old comment as an answer:

I second this comment by @Kodiologist "...I think that anything useful to the analyst as a modeling diagnostic will be useful to the reader, too, to help the reader decide if you made good modeling decisions".

Withholding potentially useful information because readers may not be statistically savvy is a bad idea. Additionally, your discussion of the meaning of these metrics does not seem like a weakness at all. It also does not really violate a reasonable interpretation of 'goodness of fit'. So I think your option #1 is by far the best choice: add the $R^2$ values to the paper and explain how to interpret them. Your explanation of the metrics here is a good one, and I think a version of this would fit well within the paper.