Solved – Difference between Mean/average accuracy and Overall accuracy

accuracycomputer visionimage processingmachine learningmean

I just got confusion while reading the paper "Local Binary Pattern-Based Hyperspectral Image Classification With Superpixel Guidance".

They mentioned that they repeated each experiment 10 times and calculated both mean and standard deviation. after that they also mentioned they calculated overall accuracy. in the results they mentioned mean and std accuracy of each class and then overall accuracy. What is the difference between average/meanMean and overall accuracy? isn't should be same?
Table where mean accuracy of each class is calculated

I found this link that explain about different method to accuracy . Is the sensitivity calculated in that method for each class is same as mean accuracy?

An example confusion matrix to calculate Class Accuracy and Overall Accuracy:

According to the references given in answer mean accuracy can be calculated as:
Mean Accuracy of Class N: 1971/ (1971 + 19 + 1 + 8 + 0 + 1) = 98.55%

Overall accuracy = (1971 + 1940 + 1891 + 1786 + 1958 + 1926) / (2000 + 2000 + 2000 + 2000 + 2000 + 2000) = 95.60

Best Answer

What the paper describes

I trust you are referring to the following quote

Each experiment is repeated ten times with a different training set to make the comparison fair, and both the mean accuracies and standard deviation are reported. For the evaluation metrics, overall accuracy (OA) and kappa coefficient (κ) are adopted to quantify the classification performance. The OA is computed by the ratio between the number of the correctly classified test samples and the total test samples.

It appears that the authors were using a single iteration of 10 fold cross validation but avoided using that terminology. The mean accuracy is related to the mean accuracy achieved across ten different training folds. So they build 10 different models using non-overlapping data and test how consistently they perform.

After cross validation an overall model is typically built using all the data from the 10 folds and this is what is used to predict the outcomes in the test set.

Overall accuracy is clearly stated as the accuracy achieved in the test set. Not ideal terminology, the term 'predictive accuracy' is maybe more along the lines of what they done

What it means

In the ideal world mean accuracy of the 10 training experiments would be identical to the overall accuracy. To achieve this would require a perfect match in terms of distribution of samples within each subsampling (mean of the training set folds and the test set) from the parent dataset.

However, each fold has a distinct set of samples so we expect variation in what the population characteristics of each fold will be, therefore what the accuracy will be. This is why standard deviation is calculated alongside mean accuracy for the training set.

This means that when you come to yet another independent set of samples (the test set) you hopefully can guess what range of accuracy you expect to achieve based on your training folds, but you will get a distinct accuracy value for that population. this is what the paper refers to as the 'overall accuracy'

UPDATE for comments

The methodology states that the authors tested class sizes of 7,10 and 15 samples per class to determine sensitivity to small sample sizes, the results are presented in Fig 8 and show that the more samples per class the better the overall accuracy, especially in the Indian Pines data set. The table you copy in your updated question states that the training set had 10 samples per class, so the mean accuracy is simply the average accuracy of each class, but this number is pretty meaningless.

To get a number that was more meaningful for comparison to the test set you would need to adjust for expected distribution of class sizes (see table I and II). Table II lists 4 classes with fewer than 150 samples which makes it impossible to sample 10 independent training sets of 15 samples. I therefore now assume the authors mean that the randomisation for selection was independent but the training sets could overlap. Whether (and how) they were able to retain enough test set samples from any of the short fall classes (C1,54,10 and 12) is not clear.

The fact remains that the class accuracy is based on the training set and the overall accuracy is based on the test set so will never agree. To be honest the completely different presentation of the training and test set results makes comparison obscure.

I recommend you read the answers to the following question on CV around the issue of classification accuracy and group imbalance. Why is accuracy not the best measure for assessing classification models?

Best Answer

What the paper describes

What it means

** UPDATE for comments **

Related Solutions

Solved – the difference between “mean value” and “average”

Mean versus average

Side point

Solved – When is weighted average of $F_1$ scores $\simeq$ accuracy in classification

Assessing the difference between a support-weighted mean $F1$ and accuracy

UPDATE for comments