Solved – Kappa for Predictive Model

predictive-models

The "standard" way to compute Kappa for a predictive classification model (Witten and Frank page 163) is to construct the random confusion matrix in such a way that the number of predictions for each class is the same as the model predicted.

For a visual, see (right side is the random):
alt text

Does anyone know why this is the case, instead of truly creating a random confusion matrix where the prior probabilities drive the number of predictions for each class. That seems the more accurate comparison against "a null model". For example, in this case, the number of actual and predicted classes would coincide (in the image uploaded, this would mean that the columns of the random confusion matrix would be 100, 60 and 40 respectively).

Thanks!
BMiner

Best Answer

It might be useful to consider Cohen's $\kappa$ in the context of inter-rater-agreement. Suppose you have two raters individually assigning the same set of objects to the same categories. You can then ask for overall agreement by dividing the sum of the diagonal of the confusion matrix by the total sum. But this does not take into account that the two raters will also, to some extent, agree by chance. $\kappa$ is supposed to be a chance-corrected measure conditional on the baseline frequencies with which the raters use the categories (marginal sums).

The expected frequency of each cell under the assumption of independence given the marginal sums is then calculated just like in the $\chi^2$ test - this is equivalent to Witten & Frank's description (see mbq's answer). For chance-agreement, we only need the diagonal cells. In R

# generate the given data
> lvls <- factor(1:3, labels=letters[1:3])
> rtr1 <- rep(lvls, c(100, 60, 40))
> rtr2 <- rep(rep(lvls, nlevels(lvls)), c(88,10,2, 14,40,6, 18,10,12))
> cTab <- table(rtr1, rtr2)
> addmargins(cTab)
     rtr2
rtr1    a   b   c Sum
  a    88  10   2 100
  b    14  40   6  60
  c    18  10  12  40
  Sum 120  60  20 200

> library(irr)       # for kappa2()
> kappa2(cbind(rtr1, rtr2))
 Cohen's Kappa for 2 Raters (Weights: unweighted)
 Subjects = 200 
   Raters = 2 
    Kappa = 0.492 
        z = 9.46 
  p-value = 0 

# observed frequency of agreement (diagonal cells)
> fObs <- sum(diag(cTab)) / sum(cTab)

# frequency of agreement expected by chance (like chi^2)
> fExp <- sum(rowSums(cTab) * colSums(cTab)) / sum(cTab)^2
> (fObs-fExp) / (1-fExp)    # Cohen's kappa
[1] 0.4915254

Note that $\kappa$ is not universally accepted at doing a good job, see, e.g., here, or here, or the literature cited in the Wikipedia article.

Related Question