Solved – Multi-class classification easier than binary classification

classificationmulti-classreferences

I have 10 different classes in my classification problem. Each class has about 200 instances, with more than 10.000 features. I performed the classification using Multinomial Bayes classification. However, it turned out that I'm only interested in the results of the 10th category.

My first thought was too merge categories 1 to 9 as negative instances, while making category 10 the positive instances, effectively changing it to a binary problem. However, my advisor tells me that multi-class classification might be better, since there are a lot of features that specifically identify a certain category, while that might not be the case if all instances of category 1 till 9 are thrown together.

I find it hard to fathom that the results will not improve if we change it to a binary classification. Multi-class classification always seems more difficult than a simple binary classification.

What are the general opinions on this? Are there cases were multi-class classification returned better results for a specific class than for using binary classification for that class? I just want to know if this is a possibility in general, so I can defend my choices in the paper. If there are any nice papers/resources about this problem out there, please point towards them!

Best Answer

This is actually true as it is possible from this simulated example using R

library(mvtnorm)
sigma <- matrix(c(1,0,0,1), ncol=2)
x1 <- rmvnorm(n=500, mean=c(0,0), sigma=sigma, method="chol")
x2<- rmvnorm(n=500, mean=c(3,0), sigma=sigma, method="chol")
x3 <- rmvnorm(n=500, mean=c(1.5,3), sigma=sigma, method="chol")
x4 <- rmvnorm(n=500, mean=c(-2.5,3), sigma=sigma, method="chol")
x5 <- rmvnorm(n=500, mean=c(-4,-2), sigma=sigma, method="chol")
data<-data.frame(rbind(x1,x2,x3,x4,x5))
data$class<-c(rep(1,500),rep(2,500),rep(3,500),rep(4,500),rep(5,500))

Visualize the data

 library(ggplot2)
 qplot(data[,1],data[,2],colour=data[,3])

Let's fit the first model and see accuracy and a plot of the predicted

 library(e1071)
 fit1<-naiveBayes(factor(class) ~., data, laplace = 0)
 data$predicted<-predict(fit1,data[,1:2],type="class")
 sum(data$predicted==data$class)/length(data$predicted)
 [1] 0.9228
qplot(data[,1],data[,2],colour=data[,3])

Now change the data and repeat the same steps for the second model with binary classification

 data2<-data
 data2$class<-c(rep(2,500),rep(1,500),rep(2,1000),rep(1,500))
 qplot(data2[,1],data2[,2],colour=data2[,3])

 fit2<-naiveBayes(factor(class) ~., data2, laplace = 0)
 data2$predicted<-predict(fit2,data2[,1:2],type="class")
     sum(data2$predicted==data2$class)/length(data2$predicted)
 qplot(data2[,1],data2[,2],colour=data2$predicted)

The underlying reason is that having a distribution for each class enhance the flexibility and can model regions with different shapes