Solved – Multi-class classification easier than binary classification

I have 10 different classes in my classification problem. Each class has about 200 instances, with more than 10.000 features. I performed the classification using Multinomial Bayes classification. However, it turned out that I'm only interested in the results of the 10th category.

My first thought was too merge categories 1 to 9 as negative instances, while making category 10 the positive instances, effectively changing it to a binary problem. However, my advisor tells me that multi-class classification might be better, since there are a lot of features that specifically identify a certain category, while that might not be the case if all instances of category 1 till 9 are thrown together.

I find it hard to fathom that the results will not improve if we change it to a binary classification. Multi-class classification always seems more difficult than a simple binary classification.

What are the general opinions on this? Are there cases were multi-class classification returned better results for a specific class than for using binary classification for that class? I just want to know if this is a possibility in general, so I can defend my choices in the paper. If there are any nice papers/resources about this problem out there, please point towards them!

library(mvtnorm) sigma <- matrix(c(1,0,0,1), ncol=2) x1 <- rmvnorm(n=500, mean=c(0,0), sigma=sigma, method="chol") x2<- rmvnorm(n=500, mean=c(3,0), sigma=sigma, method="chol") x3 <- rmvnorm(n=500, mean=c(1.5,3), sigma=sigma, method="chol") x4 <- rmvnorm(n=500, mean=c(-2.5,3), sigma=sigma, method="chol") x5 <- rmvnorm(n=500, mean=c(-4,-2), sigma=sigma, method="chol") data<-data.frame(rbind(x1,x2,x3,x4,x5)) data$class<-c(rep(1,500),rep(2,500),rep(3,500),rep(4,500),rep(5,500))

library(e1071) fit1<-naiveBayes(factor(class) ~., data, laplace = 0) data$predicted<-predict(fit1,data[,1:2],type="class") sum(data$predicted==data$class)/length(data$predicted) [1] 0.9228 qplot(data[,1],data[,2],colour=data[,3])

data2<-data data2$class<-c(rep(2,500),rep(1,500),rep(2,1000),rep(1,500)) qplot(data2[,1],data2[,2],colour=data2[,3]) fit2<-naiveBayes(factor(class) ~., data2, laplace = 0) data2$predicted<-predict(fit2,data2[,1:2],type="class") sum(data2$predicted==data2$class)/length(data2$predicted) qplot(data2[,1],data2[,2],colour=data2$predicted)

Best Answer

This is actually true as it is possible from this simulated example using R

Visualize the data

 library(ggplot2)
 qplot(data[,1],data[,2],colour=data[,3])

Let's fit the first model and see accuracy and a plot of the predicted

Now change the data and repeat the same steps for the second model with binary classification

The underlying reason is that having a distribution for each class enhance the flexibility and can model regions with different shapes

Best Answer

Related Solutions

Solved – Binary Classification vs Multi-class Classification

Solved – Evaluation of binary approach to one vs all multi-class classification

Related Question