Solved – nltk multi_kappa (Davies and Fleiss) or alpha (Krippendorff)

agreement-statisticscohens-kappanltkpython

I'm using inter-rater agreement to evaluate the agreement in my rating dataset. I have a set of N examples distributed among M raters. Not all raters voted every item, so I have N x M votes as the upper bound.
So let's say the rater i gives the following votes the the N items, for a given N=5 and M=3, where in the array at position j there is the j-th item:

rater[1] = [1,3,0,5,5]
rater[2] = [0,3,1,5,2]
rater[3] = [1,2,0,5,3]

where 0 meaning that the voter did not express any option about item in position j.
Now, I cannot use the Cohen's Kappa, since it requires to have almost two rathers, so I think to use the Alpha Krippendorff of NLTK or the multi-kappa.

In my dataset

  • Votes eventually can be sparse, i.e there can be items that have few votes hence like the worst case of

    rater[i] = [0, 0, ...,j, ..., 0]
    

so the item j could have just one vote by the rater i in the whole dataset.

  • Each item must have at least one vote, hence there are no items with a zero array.
  • The numbers of raters M is less than the numbers of items N, M < N.

Which is the best approach as for the NLTK metrics package implementation?

Best Answer

I found this solution and it was useful for my case so you probably check it out. Here's the possible solution for your dataset.

import krippendorff
data = [[1,3,0,5,5],
       [0,3,1,5,2],
       [1,2,0,5,3]]    

kappa = krippendorff.alpha(data)
print(kappa) 

It works with Python 3.4+ and don't forget to install dependencies

pip install numpy krippendorff