Solved – Comparing multiclass classification algorithms for a particular application

classificationMATLABmethod-comparisonreferencessupervised learning

I am assessing a bunch of classification algorithms for a specific application with multiple classes. The classification algorithms that I am considering are:

Multinomial Logistic Regression (Matlab's 'mnrfit')
Multiclass SVM (K. Crammer and Y. Singer. On the Algorithmic Implementation of Multi-class SVMs, JMLR, 2001.). I will use the code provided by the authors since Matlab's 'svmtrain' only does binary classification.
Neural Networks (Matlab's 'nprtool')
Decision Trees (C4.5 and CART from Matlab's 'classregtree')
k-Nearest Neighbors (Matlab's 'ClassificationKNN')
Naive Bayes Classifier (Matlab's 'naiveBayes')
Discriminant Analysis (Matlab's 'ClassificationDiscriminant')
Random Forests (Matlab's 'TreeBagger')

I have the following questions:

Have I omitted any "obvious" multiclass classification algorithm that's a must-try? Or, are there any binary classifiers that can easily be used for multiclass with one-vs-all method.
Which of these are linear classifiers and which are non-linear classifiers? I know 3, 4 and 5 are non-linear by nature and 2 can be non-linear with the kernel trick.
Which of these are discrete classifiers and which are probabilistic? For example, logistic regression gives a probability for each class, while decision trees give exactly one class. On a related note, if I want to use ROC curves for comparison, how do I compare a discrete classifier with a probabilistic one, as the former gives only a point on the ROC plot?
Which of these are deterministic classifiers and which are stochastic? In other words, which classifiers will yield exactly the same results for multiple runs? This is important for me: For example in logistic regression I can be done with just one run, whereas for neural networks, I will have to train the net multiple times to avoid biased results.

Finally, are there any good papers which show how to compare various classification algorithms. I found this one which is quite good.

Best Answer

I am simply copy-pasting the answers I got from Alexandre Passos on Metaoptimize. It would really help if someone here can add more to it.

Any binary classifier can be used for multiclass with the 1-vs-all reduction, or the all-vs-all reduction. This list seems to cover most of the common multiclass algorithms.

Logistic regression and SVMs are linear (though SVMs are linear in kernel space). Neural networks, decision trees, and knn aren't lineasr. Naive bayes and discriminant analysis are linear. Random forests aren't linear.

Logistic regression can give you calibrated probabilities. So can many SVM implementations (though it requires slightly different training). Neural networks can do that too, if using a right loss (softmax). Decision trees and KNN can be probabilistic, though are not particularly well calibrated. Naive bayes does not produce well calibrated probabilities, nor does the discriminant analysis. I'm not sure about random forests, depends on the implementation I think.

All are deterministic except for neural networks and random forests.

Why do you want to compare different classification algorithms? Are you trying to decide which one is the best in general, or just for one application?

If the former, it's not worth doing it, as most claims are rather sketchy and there is no method which can give that kind of conclusion. If the latter, it is well accepted that cross-validation, or comparing performance on a fixed test-set, gives you unbiased results. For multiclass classification it is not always obvious which metric to use, but things like accuracy; per-class precision/recall/f1, per-class AUC, and the confusion matrix are commonly used.

Related Solutions

Solved – Classification and regression trees (cart)

It looks to me like classregtree is just building a tree, not using any of these methods, all of which are supplementary to tree building. That is, classregtree is implementing the methods described in Breiman et al., per the reference given in the documentation. It builds a tree and then (by default) prunes it.

Solved – Difference Between Matlabs classify and ClassificationDiscriminant functions

The two functions do slightly different things.

According to the docs, classify uses a uniform prior (i.e., each class is equally likely). On the other hand, it looks ClassificationDiscriminant uses an empirical prior (i.e., the proportions of each class in your training data).

To extend your example on Gist:

N1 = 3500; N2 = 1500;
data = [randn(N1, 1) ; 1 + randn(N2, 1)];
labels = [zeros(N1, 1) ; ones(N2, 1)];

y_pred = classify(data, data, labels);
TP1 = sum(y_pred == 1 & labels == 1); %TP1 is now 1033

temp = ClassificationDiscriminant.fit(data, labels);
y_pred2 = temp.predict(data);
TP2 = sum(y_pred2 == 1 & labels == 1); %TP2 is 582

If you want them to match, you have two options. You could force ClassificationDiscriminant to use a uniform prior, like so:

temp.Prior = [0.5 0.5]
y_pred2 = temp.predict(data);
TP2 = sum(y_pred2 == 1 & labels == 1); %TP2 is now 1033, matching TP1

Alternately, you could provide the empirical prior to the classify function, like this:

y_pred = classify(data, data, labels, 'linear', [N1 N2]./(N1+N2));
TP1 = sum(y_pred == 1 & labels == 1); %TP1 is now 582

You could also pass the string 'empirical' as the last argument to classify for the same effect. Here are links to the relevant documentation:

Best Answer

Related Solutions

Solved – Classification and regression trees (cart)

Solved – Difference Between Matlabs classify and ClassificationDiscriminant functions

Related Question