Solved – Inter-rater reliability using Intra-class correlation with ratings for multiple objects on multiple properties

agreement-statisticsintraclass-correlationreliabilityspss

Initial Question:

I am trying to calculate inter-rater reliability. Previous researchers in this area have used intraclass correlation. SPSS has options for two-way random, mixed random and one-way random models. SPSS help says choose the right one based on whether the 'people effects are random' and 'item effects are random', can anybody please explain to me what these terms mean?

Also I wanted to know how to structure my data file – I think I need to put each raters' rating of each item in a separate column (rather than each rater on a separate row), is this right?

UPDATED question:

Thanks to the help I have received, I have visited the websites, which were useful. I have decided that I need to use the mixed model, and I have transposed my files so that each rater is a column.

Additivity:

The websites also mentioned some assumptions, one of which is additivity, tested with Tukey's test of non-additivity. Apparently, this tests the hypothesis that there is no multiplicative interaction between the cases and the items. I think maybe I have too many variables in my calculation, because the examples others use have a lot less, e.g. each rater making only one rating, as described below, my raters have made many more ratings than that.

  • What does the additivity assumption mean?
  • What should be done when the test of non-additivity is statistically significant?

Inter-rater reliability with multiple objects and properties

I will explain in more detail what I have done. I have five raters and they have rated facial and vocal expressions of participants watching different emotional films. Raters make 18 ratings per film, and there are 16 films, so there is a total of 288 variables per rater, per participant rated. Each rater has rated the same 4 participants each, and I have placed these participants into different files. I thought about making seperate files for each of the 16 films, but that would be so many files.

  • How should I calculate ICC and organise my files in this specific situation?
  • Should I calculate a mean of individual ICCs or perhaps select a random sample?

Best Answer

These are distinct ways of accounting for raters or items variance in overall variance, following Shrout and Fleiss (1979) (cases 1 to 3 in Table 1):

  • One-way random effects model: raters are considered as sampled from a larger pool of potential raters, hence they are treated as random effects; the ICC is then interpreted as the % of total variance accounted for by subjects/items variance. This is called the consistency ICC.
  • Two-way random effects model: both factors -- raters and items/subjects -- are viewed as random effects, and we have two variance components (or mean squares) in addition to the residual variance; we further assume that raters assess all items/subjects; the ICC gives in this case the % of variance attributable to raters + items/subjects.
  • Two-way mixed model: contrary to the one-way approach, here raters are considered as fixed effects (no generalization beyond the sample at hand) but items/subjects are treated as random effects; the unit of analysis may be the individual or the average ratings.

I would say raters have to be entered as columns, although I'm not a specialist of SPSS. Dave Garson's dedicated website is worth looking at for those working with SPSS. There is also a complete on-line tutorial on reliability analysis (Robert A. Yaffee) [archived version].

For theoretical consideration about the mixed-effect approach, please consider reading my answer to this related question: Reliability in Elicitation Exercise.

Related Question