Solved – Evaluation of binary approach to one vs all multi-class classification

classificationdata miningmachine learningmulti-class

I'm working on a multi-class problem which I have redefined as a series of binary problems (i.e. a one vs all classification problem). However, each observation can belong to more than one class. For example, if my observations where different kinds of fruit my classes might represent different characteristics such red and round. In some cases a fruit is both red and round.

My question is: what should I consider when evaluating my binary models? Can one simply use metrics such as accuracy to understand the performance of the model. If I have three different classes (i.e red, round and sweet) is it acceptable to merely take the mean accuracy of the three binary classification tasks as the accuracy of my model as a whole?

This is a little different than most multi-class classification problems I've seen where all the classes are independent.

Best Answer

Since each sample might belong to some classes, I would have break in to binary classes not in a one vs. all way but look for membership in the class (e.g., IsRed? IsRound?).

As for the evaluation, you should use measures that fit your needs. I'm not familiar with your needs but it is quite rare that that mean of accuracies will fit the need.

You can either try to evaluate each classifier separately (accuracy of IsRed, accuracy of IsRound) or the accuracy of a full prediction (All relevant classifiers correctly classified the sample).

Related Solutions

Solved – Multi-class Confusion Matrix to Binary confusion matrix

Welcome to the website, this is a variation of a commonly asked question. You can definitely convert a multi class matrix to a binary conf matrix.

Below is some R code on how you can collapse a confusion matrix to a binary one. It also calculates Cohen's kappa to get the overall 'rater' agreement between the classifeir and the actual class (of cmg).

cmg <- matrix(c(1639, 116, 49, 35, 138, 0, 0, 236,
                 150, 274, 27, 21,  28, 0, 0,  73,
                  22,  24, 58,  9,  94, 0, 0,  30,
                  33,  27, 31, 21, 146, 0, 0,  49,
                  14,   9,  5,  1,  49, 0, 0,  22,
                   1,   0,  1,  1,   7, 0, 0,   6,
                  11,   0,  0,  1,  14, 0, 0,  21,
                 201,  11,  8,  5,  49, 0, 0, 253), 
              ncol=8,dimnames = rep(list(("T1","T2","T3","T4","T5", "T6","T7", "T8")),2))

require(psych)

# Overall agreement
overall_agg <- sum(diag(cmg))/sum(cmg)

# Overall Cohen's Kappa for cmg
unweighted_kappa <- cohen.kappa( cmg, n.obs=sum(cmg) )

# initialise containers
spec_agr_guideline <- list()        
collapsed_mat_guideline <- list() 
unweighted_kappa_psych <- list()

# loop through all treatments    
for (i in seq(1,nrow(cmg)) ) {
  # Specific agreements
  spec_agr_guideline [i] <- 2*cmg[i,i] / (sum(cmg[i,]) + sum (cmg[,i]) )
  # Collapsed positive agreement confusion matrices per treatment
  collapsed_mat_guideline[[i]] <- matrix(c(cmg[i,i],             sum(cmg[i,])-cmg[i,i],
                                         sum(cmg[,i])-cmg[i,i],  sum(cmg)-sum(cmg[i,])-sum(cmg[,i])+cmg[i,i]), 
                                       ncol=2)
  # Calculate unweighted Cohen's Kappa per collapsed (binary) confusion amtrix
  unweighted_kappa_psych[[i]] <- cohen.kappa( collapsed_mat_guideline[[i]], n.obs=sum(collapsed_mat_guideline[[i]]) )

Furthermore, you can do some other cool stuff to assess the performance of a multi-class classifier. Some relevant answers from CrossValidated.com are: link1, link2, link3.

Solved – Multi-class classification with growing number of classes – question

This is not the only way, and it may not work for all problems, but one solution would be to compare the performance of a range of class numbers (the current number and one more, or the current number and one either side, or two either side - the number of classes you explore each update depends on how much computational effort you can spare) and use an information criterion (e.g. corrected Akaike's Information Criterion, AICc) to assess goodness-of-fit for each alternative. The model with the lowest AICc is the 'best' fit, although trivial differences (delta-AICc smaller than about 5-10) are not sufficient to conclude that either model is substantially better. You could go one step further and calculate relative likelihoods for different alternatives using Akaike weights.

I'd recommend taking a look at Burnham and Anderson (2002) "Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach".

Best Answer

Related Solutions

Solved – Multi-class Confusion Matrix to Binary confusion matrix

Solved – Multi-class classification with growing number of classes – question

Related Question