Solved – Posterior probabilities with decision trees or decision forests

bayesianboostingcalibrationcartposterior

Is there a way to get posterior probabilities $P(C | \vec{x})$ (probability that a data item $\vec{x}$ belong to one of the given classes) in a multiclass classification problem using decision trees or forests?

I found some hints about using calibration methods (e.g. Platt's method or Isotonic Regression) in combination with boosted or bagged trees. However, as I'm not experienced in this field I can't find a good explanation how this works. It would be very helpful if one of you could explain me the general idea how to get posterior probabilities with decision trees or a good link or paper where these things are explained.

Best Answer

This is explained in section 2.2.4 of the paper Decision Forests for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning.

In the case of classification each leaf may store the empirical distribution over the classes associated to the subset of training data that has reached that leaf.

During testing, each tree leaf yields a distribution over classes and the forest output is the average of these leaf distributions.

Related Question