Solved – How to this rating have negative Fleiss Kappa

cohens-kappa

I am trying to calculate the Fleiss' Kappa of the following data and I got the result:

 [[0 0 3]
 [0 1 2]
 [0 0 3]
 [0 0 3]
 [0 0 3]
 [0 0 3]
 [0 0 3]
 [0 0 3]
 [0 0 3]
 [0 1 2]
 [0 0 3]
 [0 0 3]
 [0 0 3]
 [0 0 3]
 [0 0 3]
 [0 0 3]
 [0 0 3]
 [0 1 2]
 [0 0 3]
 [0 0 3]
 [0 0 3]
 [0 0 3]
 [0 0 3]
 [0 0 3]
 [0 0 3]
 [0 0 3]
 [0 0 3]]
3 raters.
27 subjects.
3 categories.
p = [0.0, 0.037037037037037035, 0.9629629629629629]
P = [1.0, 0.3333333333333333, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.3333333333333333, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.3333333333333333, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
Pbar = 0.925925925926
PbarE = 0.928669410151
kappa = -0.0384615384615

However, the ratings of the 3 raters obviously have huge agreement: nearly all of the 27 items got rating "3".

I am really confused…Can someone explain this to me please?

Best Answer

I believe that calculations for Fleiss' Kappa would view the way your data set is set out as having disagreement between raters. I.e. the first row says rater one recorded 0, rater two recorded 0, and rater three recorded 3 = only two out of three of the raters agreed. The second row shows that rater one (column one answers), rater two (column two answers) and rater three (column three answers) all recorded a different result, therefore there is no agreement between them. You may find this helpful for setting up your data set http://dfreelon.org/utils/recalfront/recal3/ or this http://www.real-statistics.com/reliability/fleiss-kappa/