Solved – Fleiss kappa vs Cohen kappa

agreement-statisticscohens-kappametric

Can somebody explain in-detailed differences between Fleiss kappa and Cohen kappa? And how the metric works under the hood?

  • When would one use Fleiss kappa over Cohen kappa?
  • What are the advantages/disadvantages of using Fleiss kappa over Cohen kappa?

Best Answer

Fleiss' $\kappa$ works for any number of raters, Cohen's $\kappa$ only works for two raters; in addition, Fleiss' $\kappa$ allows for each rater to be rating different items, while Cohen's $\kappa$ assumes that both raters are rating identical items.

However, Fleiss' $\kappa$ can lead to paradoxical results (see e.g. Gwet, Handbook of Interrater Reliability, namely that, even with nominal categories, reordering the categories can change the results. But Cohen's version has its own problems and can lead to odd results when there are large differences in the prevalence of possible outcomes (see e.g. Feinstein and Cicchetti, High Agreement but low Kappa.

Gwet's AC1 statistic appears to be immune to these problems. For R raters it is given by

$\gamma_1 = \frac{P_a-P_{e|\gamma_1}}{1-P_{e|\gamma_1}} $

where $P_{e|\gamma_1} = \frac{1}{K-1}\sum{\hat{\pi}_k}(1-\hat{\pi}_k)$

and $\hat{\pi}_k = \sum{\frac{R_{ik}}{R}} $

Related Question