Solved – Random Forest Probability vs Logistic Regression Probability

logisticprobabilityrandom forest

I understand that the probability in the random forest algorithm uses the count of trees that vote for a certain class, while logistic regression uses MLE. What are the implications of using one vs another? And is the RF probability a true probability?

Best Answer

In a nutshell, logistic regression aims to produce an estimation of the probability of belonging to a specific class. So there is only one "probability estimate" after a logistic regression. On the other hand, the probability obtained using random forest is more like a by product, taking advantage of having many trees (though this is implementation dependent! more details below) and therefore, there are many ways to infer probabilities from a random forest.

Random forest probability

Indeed, it is not a true probability, in the sense that it is just an average over the number of trees.

For the implication, they will depend on the penalty function that you use. Usually, random forest will produce many ties (in terms of probabilities) and 0 and 1. This is not good when your metric is the AUC (see this article on wikipedia if you are not familiar with AUC), because of the ties, and not good either when you observe a logarithmic loss (because the 0 and 1 can have a large impact on the penalty).

However, there as some alternatives to improve the estimation of probabilities, as detailed here.

H. Boström. Estimating class probabilities in random forests. In Proc. of the International Conference on Machine Learning and Applications, pages 211–216, 2007.

Logistic regression probability

Usually, they produce a good estimate of the probability. But as opposed to random forest, they do not take into account possible interactions of the input. So it may harm performance as well. I suspect that in most cases, if you penalty is just the accuracy of the model (and some interactions are important) a logistic regression would give poor results compared to a random forest.