This is actually true as it is possible from this simulated example using R
library(mvtnorm)
sigma <- matrix(c(1,0,0,1), ncol=2)
x1 <- rmvnorm(n=500, mean=c(0,0), sigma=sigma, method="chol")
x2<- rmvnorm(n=500, mean=c(3,0), sigma=sigma, method="chol")
x3 <- rmvnorm(n=500, mean=c(1.5,3), sigma=sigma, method="chol")
x4 <- rmvnorm(n=500, mean=c(-2.5,3), sigma=sigma, method="chol")
x5 <- rmvnorm(n=500, mean=c(-4,-2), sigma=sigma, method="chol")
data<-data.frame(rbind(x1,x2,x3,x4,x5))
data$class<-c(rep(1,500),rep(2,500),rep(3,500),rep(4,500),rep(5,500))
Visualize the data
library(ggplot2)
qplot(data[,1],data[,2],colour=data[,3])
Let's fit the first model and see accuracy and a plot of the predicted
library(e1071)
fit1<-naiveBayes(factor(class) ~., data, laplace = 0)
data$predicted<-predict(fit1,data[,1:2],type="class")
sum(data$predicted==data$class)/length(data$predicted)
[1] 0.9228
qplot(data[,1],data[,2],colour=data[,3])
Now change the data and repeat the same steps for the second model with binary classification
data2<-data
data2$class<-c(rep(2,500),rep(1,500),rep(2,1000),rep(1,500))
qplot(data2[,1],data2[,2],colour=data2[,3])
fit2<-naiveBayes(factor(class) ~., data2, laplace = 0)
data2$predicted<-predict(fit2,data2[,1:2],type="class")
sum(data2$predicted==data2$class)/length(data2$predicted)
qplot(data2[,1],data2[,2],colour=data2$predicted)
The underlying reason is that having a distribution for each class enhance the flexibility and can model regions with different shapes
Best Answer
Assuming data are considered missing completely at random (cf. @whuber's comment), using an ensemble learning technique as described in the following paper might be interesting to try:
The general idea is to train multiple classifiers on a subset of the variables that compose your dataset (like in Random Forests), but to use only the classifiers trained with the non-missing features for building the classification rule. Be sure to check what the authors call the "distributed redundancy" assumption (p. 3 in the preprint linked above), that is there must be some equally balanced redundancy in your features set.