Solved – Computing inter-rater reliability in R with variable number of ratings

agreement-statisticsrrandom-effects-modelreliability

Wikipedia suggests that one way to look at inter-rater reliability is to use a random effects model to compute intraclass correlation. The example of intraclass correlation talks about looking at

$$\frac{\sigma_\alpha^2}{\sigma_\alpha^2+\sigma_\epsilon^2}$$

from a model

$$Y_{ij} = \mu + \alpha_i + \epsilon_{ij}$$

"where Yij is the jth observation in the ith group, μ is an unobserved overall mean, αi is an unobserved random effect shared by all values in group i, and εij is an unobserved noise term."

This is an attractive model especially because in my data no rater has rated all things (although most have rated 20+), and things are rated a variable number of times (usually 3-4).

Question #0: Is "group i" in that example ("group i") a grouping of things being rated?

Question #1: If I'm looking for inter-rater-reliability, don't I need a random effects model with two terms, one for the rater, and one for the thing rated? After all, both have possible variation.

Question #2: How would I best express this model in R?

It looks as if this question has a nice-looking proposal:

lmer(measurement ~ 1 + (1 | subject) + (1 | site), mydata)

I looked at a couple questions, and the syntax of the "random" parameter for lme is opaque to me. I read the help page for lme, but the description for "random" is incomprehensible to me without examples.

This question is somewhat similar to a long list of questions, with this the closest. However, most don't address R in detail.

Best Answer

The model you referenced in your question is called the "one-way model." It assumes that random row effects are the only systematic source of variance. In the case of inter-rater reliability, rows correspond to objects of measurement (e.g., subjects).

One-way model: $$ x_{ij} = \mu + r_i + w_{ij} $$ where $\mu$ is the mean for all objects, $r_i$ is the row effect, and $w_{ij}$ is the residual effect.

However, there are also "two-way models." These assume that there is variance associated with random row effects as well as random or fixed column effects. In the case of inter-rater reliability, columns correspond to sources of measurement (e.g., raters).

Two-way models: $$ x_{ij} = \mu + r_i + c_j + rc_{ij} + e_{ij} $$$$ x_{ij} = \mu + r_i + c_j + e_{ij} $$ where $\mu$ is the mean for all objects, $r_i$ is the row effect, $c_j$ is the column effect, $rc_{ij}$ is the interaction effect, and $e_{ij}$ is the residual effect. The difference between these two models is the inclusion or exclusion of the interaction effect.

Given a two-way model, you can calculate one of four ICC coefficients: the single score consistency ICC(C,1), the average score consistency ICC(C,k), the single score agreement ICC(A,1), or the average score agreement ICC(A,k). Single score ICCs apply to single measurements $x_{ij}$ (e.g., individual raters), whereas average score ICCs apply to average measurements $\bar{x}_i$ (e.g., the mean of all raters). Consistency ICCs exclude the column variance from the denominator variance (e.g., allowing raters to vary around their own means), whereas agreement ICCs include the column variance in the denominator variance (e.g., requiring raters to vary around the same mean).

Here are the definitions if you assume a random column effect:

Two-way Random-Effects ICC Definitions (with or without interaction effect): $$ ICC(C,1) = \frac{\sigma_r^2}{\sigma_r^2 + (\sigma_{rc}^2 + \sigma_e^2)}\text{ or }\frac{\sigma_r^2}{\sigma_r^2 + \sigma_e^2} $$ $$ ICC(C,k) = \frac{\sigma_r^2}{\sigma_r^2 + (\sigma_{rc}^2 + \sigma_e^2)/k}\text{ or }\frac{\sigma_r^2}{\sigma_r^2 + \sigma_e^2/k} $$ $$ ICC(A,1) = \frac{\sigma_r^2}{\sigma_r^2 + (\sigma_c^2 + \sigma_{rc}^2 + \sigma_e^2)}\text{ or }\frac{\sigma_r^2}{\sigma_r^2 + (\sigma_c^2 + \sigma_e^2)} $$ $$ ICC(A,k) = \frac{\sigma_r^2}{\sigma_r^2 + (\sigma_c^2 + \sigma_{rc}^2 + \sigma_e^2)/k}\text{ or }\frac{\sigma_r^2}{\sigma_r^2 + (\sigma_c^2 + \sigma_e^2)/k} $$

You can also estimate these values using mean squares from ANOVA:

Two-way ICC Estimations: $$ ICC(C,1) = \frac{MS_R - MS_E}{MS_R + (k-1)MS_E} $$ $$ ICC(C,k) = \frac{MS_R-MS_E}{MS_R} $$ $$ ICC(A,1) = \frac{MS_R-MS_E}{MS_R + (k-1)MS_E + k/n(MS_C-MS_E)} $$ $$ ICC(A,k) = \frac{MS_R-MS_E}{MS_R + (MS_C-MS_E)/n} $$

You can calculate these coefficients in R using the irr package:

icc(ratings, model = c("oneway", "twoway"),
type = c("consistency", "agreement"),
unit = c("single", "average"), r0 = 0, conf.level = 0.95)

References

McGraw, K. O., & Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1(1), 30–46.

Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86(2), 420–428.