Solved – Random Forest probability

probabilityrandom forest

Random forest is forest of trees. and the probability that we get is the proportion of trees predicting it as a specific class. So does each tree of random forest predicts the class by taking cut-off probability as 0.5? and then use this classification from each tree to find out the proportion of tree inclined towards a class as its probability?

Moreover, do we have any method that gives us the average of the probability from each tree in random forest ?

EDIT 1:

Hi @user20160, Thanks for your answer. For the method 2 in R , you mean we predict as predict(randomForestModel, type="prob"). Since I made just 3 trees in random forest(two clasess i.e. 0 and 1), the final output of the predict function mentioned above that I got is one of 0,0.33,0.5,0.66,1,NA for all the records. This does mean it has taken the proportion of trees out of three trees voting for class 1. e.g if no tree is voting for class 1 then 0, if 1 tree is voting then 0.33 and so on. It could be NA because a record may notcome up in OOB of either of the three tree. So it is actually not taking the average of probability but only taking as the proportion of the trees as probability. Please help me in understanding this.

Best Answer

There are a couple ways to implement random forest classifiers. Suppose $x$ is some data point and there are $k$ classes.

Method 1: Each tree predicts the class of $x$ according to the leaf node $x$ falls within. The leaf node output is the majority class of the training points it contains. The predictions of all trees are considered as votes, and the class with the most votes is taken as the output of the forest. This is the original formulation of random forests proposed by Breiman (2001).

Method 2: Each tree outputs a vector $[p_1, \dots, p_k]$ representing the predicted probability of each class given $x$. This may be estimated as the relative class frequencies of training points in the leaf node $x$ falls within. The forest output is the average of these vectors across trees, representing a conditional distribution over classes given $x$.

Method 2 is nice because it gives probabilistic predictions. But, for some problems, further steps may be needed to ensure that they're well calibrated. For example, see:

Niculescu-Mizil and Caruana (2005). Predicting good probabilities with supervised learning.