Solved – Top five classifiers to try first

classificationmachine learningmethodology

Besides obvious classifier characteristics like

  • computational cost,
  • expected data types of features/labels and
  • suitability for certain sizes and dimensions of data sets,

what are the top five (or 10, 20?) classifiers to try first on a new data set one does not know much about yet (e.g. semantics and correlation of individual features)? Usually I try Naive Bayes, Nearest Neighbor, Decision Tree and SVM – though I have no good reason for this selection other than I know them and mostly understand how they work.

I guess one should choose classifiers which cover the most important general classification approaches. Which selection would you recommend, according to that criterion or for any other reason?


UPDATE: An alternative formulation for this question could be: "Which general approaches to classification exist and which specific methods cover the most important/popular/promising ones?"

Best Answer

Random Forest

Fast, robust, good accuracy, in most cases nothing to tune, requires no normalization, immune to collinearity, generates quite good error approximation and useful importance ranking as a side effect of training, trivially parallel, predicts in a blink of an eye.

Drawbacks: slower than trivial methods like kNN or NB, works best with equal classes, worse accuracy than SVM for problems desperately requiring kernel trick, is a hard black-box, does not make coffee.

Related Question