Machine Learning – How to Predict Class Probabilities Using Various Classification Models

classificationlogisticmachine learningnaive bayesprobability

I am looking for classifiers that output probabilties that examples belong to one of two classes.

I know of logistic regression and naive Bayes, but can you tell me of others that work in a similar way? That is, classifiers that predict not the classes to which examples belong, but the probability that examples fit to a particular class?

Bonus points for any thoughts you can share on the advantages and disadvantages of these different classifiers (including logistic regression and naive Bayes). For example, are some better for multi-class classification?

Best Answer

SVM is closely related to logistic regression, and can be used to predict the probabilities as well based on the distance to the hyperplane (the score of each point). You do this by making score -> probability mapping some way, which is relatively easy as the problem is one-dimensional. One way is to fit an S-curve (e.g. the logistic curve, or its slope) to the data. Another way is to use isotonic regression to fit a more general cumulative distribution function to the data.

Other than SVM, you can use a suitable loss function for any method which you can fit using gradient-based methods, such as deep networks.

Predicting probabilities is not something taken into consideration these days when designing classifiers. It's an extra which distracts from the classification performance, so it's discarded. You can, however, use any binary classifier to learn a fixed set of classification probabilities (e.g. "p in [0, 1/4], or [1/4, 1/2], or ...") with the "probing" reduction of Langford and Zadrozny.

Related Question