Solved – Multiclass F1 score in the form of a confusion matrix

confusion matrixmachine learningmodel-evaluation

I have creating a multiclass model and I am wondering if it makes any sense to calculate F1 scores, and other metrics like Cohen kappa etc., in the same form as a confusion matrix.

Basically I calculate the F1 score between my a true class and one false class, ignore the other n-2 classes, and put it at one location in the matrix. Then I repeat this for all true and false classes in the same way and receive a matrix of F1 scores.

My question is if this makes any sense? I do this because I am seeking to get an overview of the results from several metrics while I also want to make it easier to identify which class combinations are resulting in the highest mistakes for these different metrics. The main reason I go for creating a similar representation to a confusion matrix is because I find it easy to visualise and to analyse by eye.

Other suggestions for how to go about this are very welcome of course.

Best Answer

So it's like when you have only two classes. Assume you have N class, then the confusion matrix would be a NxN matrix where left axis showing the true class and the top axis showing the class assigned to an item with that true class. here in this link, you could see a good example that describing confusion matrix for multiple classes: Computing Precision and Recall for Multi-Class Classification Problems

Related Solutions

Solved – Multi-class Confusion Matrix to Binary confusion matrix

Welcome to the website, this is a variation of a commonly asked question. You can definitely convert a multi class matrix to a binary conf matrix.

Below is some R code on how you can collapse a confusion matrix to a binary one. It also calculates Cohen's kappa to get the overall 'rater' agreement between the classifeir and the actual class (of cmg).

cmg <- matrix(c(1639, 116, 49, 35, 138, 0, 0, 236,
                 150, 274, 27, 21,  28, 0, 0,  73,
                  22,  24, 58,  9,  94, 0, 0,  30,
                  33,  27, 31, 21, 146, 0, 0,  49,
                  14,   9,  5,  1,  49, 0, 0,  22,
                   1,   0,  1,  1,   7, 0, 0,   6,
                  11,   0,  0,  1,  14, 0, 0,  21,
                 201,  11,  8,  5,  49, 0, 0, 253), 
              ncol=8,dimnames = rep(list(("T1","T2","T3","T4","T5", "T6","T7", "T8")),2))

require(psych)

# Overall agreement
overall_agg <- sum(diag(cmg))/sum(cmg)

# Overall Cohen's Kappa for cmg
unweighted_kappa <- cohen.kappa( cmg, n.obs=sum(cmg) )

# initialise containers
spec_agr_guideline <- list()        
collapsed_mat_guideline <- list() 
unweighted_kappa_psych <- list()

# loop through all treatments    
for (i in seq(1,nrow(cmg)) ) {
  # Specific agreements
  spec_agr_guideline [i] <- 2*cmg[i,i] / (sum(cmg[i,]) + sum (cmg[,i]) )
  # Collapsed positive agreement confusion matrices per treatment
  collapsed_mat_guideline[[i]] <- matrix(c(cmg[i,i],             sum(cmg[i,])-cmg[i,i],
                                         sum(cmg[,i])-cmg[i,i],  sum(cmg)-sum(cmg[i,])-sum(cmg[,i])+cmg[i,i]), 
                                       ncol=2)
  # Calculate unweighted Cohen's Kappa per collapsed (binary) confusion amtrix
  unweighted_kappa_psych[[i]] <- cohen.kappa( collapsed_mat_guideline[[i]], n.obs=sum(collapsed_mat_guideline[[i]]) )

Furthermore, you can do some other cool stuff to assess the performance of a multi-class classifier. Some relevant answers from CrossValidated.com are: link1, link2, link3.

Solved – Random Forest confusion matrix

To address both your questions.

The discrepancy between rfFit and rfFit$finalModel

I believe it is normal to have some discrepancy between your rfFit and rfFit$finalModel. As you can see in the output from rfFit there is also a Accuracy SD column. Your Accuracy returned here is an average as a result of your repeated cross validation. The Accuracy returned by rfFit$finalModel is a single model fit with the best parameters determined by your CV (which you may notice is within 1 SD of your accuracy. As noted by topepo below, it is also a different metric whereby the former is by class predictions, the latter is by OOB.

Why perfect prediction with training samples?

This appears to be a common concern. What you have done here is develop the best model to classify your training samples. Random forest is especially good at classification. That said, you have just trained the model to fit these exact samples. Therefore, it very likely you will have an inflated accuracy when fitting the same samples (especially with random forest in my personal experience). What you should do, pending the size of your initial dataset, is subset a testing group that will entirely independent of your training samples. That way you can apply your newly optimized model on some samples that were not part of the tuning process. Ideally you would have a completely separate dataset to evaluate but often people don't have that luxury.

Best Answer

Related Solutions

Solved – Multi-class Confusion Matrix to Binary confusion matrix

Solved – Random Forest confusion matrix

Related Question