I am assessing a bunch of classification algorithms for a specific application with multiple classes. The classification algorithms that I am considering are:
- Multinomial Logistic Regression (Matlab's 'mnrfit')
- Multiclass SVM (K. Crammer and Y. Singer. On the Algorithmic Implementation of Multi-class SVMs, JMLR, 2001.). I will use the code provided by the authors since Matlab's 'svmtrain' only does binary classification.
- Neural Networks (Matlab's 'nprtool')
- Decision Trees (C4.5 and CART from Matlab's 'classregtree')
- k-Nearest Neighbors (Matlab's 'ClassificationKNN')
- Naive Bayes Classifier (Matlab's 'naiveBayes')
- Discriminant Analysis (Matlab's 'ClassificationDiscriminant')
- Random Forests (Matlab's 'TreeBagger')
I have the following questions:
- Have I omitted any "obvious" multiclass classification algorithm that's a must-try? Or, are there any binary classifiers that can easily be used for multiclass with one-vs-all method.
- Which of these are linear classifiers and which are non-linear classifiers? I know 3, 4 and 5 are non-linear by nature and 2 can be non-linear with the kernel trick.
- Which of these are discrete classifiers and which are probabilistic? For example, logistic regression gives a probability for each class, while decision trees give exactly one class. On a related note, if I want to use ROC curves for comparison, how do I compare a discrete classifier with a probabilistic one, as the former gives only a point on the ROC plot?
- Which of these are deterministic classifiers and which are stochastic? In other words, which classifiers will yield exactly the same results for multiple runs? This is important for me: For example in logistic regression I can be done with just one run, whereas for neural networks, I will have to train the net multiple times to avoid biased results.
Finally, are there any good papers which show how to compare various classification algorithms. I found this one which is quite good.
Best Answer
I am simply copy-pasting the answers I got from Alexandre Passos on Metaoptimize. It would really help if someone here can add more to it.