Solved – MCMC sampling of decision tree space vs. random forest

cartmarkov-chain-montecarlomonte carlorandom forest

A random forest is a collection of decision trees formed by randomly selecting only certain features to build each tree with (and sometimes bagging the training data). Apparently they learn and generalize well. Has anybody done MCMC sampling of the decision tree space or compared them to random forests? I know it might be computationally more expensive to run the MCMC and save all the sampled trees, but I am interested in the theoretical features of this model, not the computational costs. What I mean is something like this:

  1. Construct a random decision tree (It would probably perform horribly)
  2. Compute likelihood of the tree with something like $P(Tree|Data) \propto P(Data|Tree)$, or perhaps add a $P_{prior}(Tree)$ term.
  3. Choose a random step to change the tree and select based on the likelihood $P(Tree|Data)$.
  4. Every N steps, save a copy of the current tree
  5. Go back to 3 for some large N*M times
  6. Use the collection of M saved trees to do prediction

Would this give a similar performance to Random Forests? Note that here we are not throwing away good data or features at any step unlike random forests.

Best Answer

This was done some 13 years ago by Chapman, George and McCulloch (1998, JASA). Of course there's been huge literature on Bayesian regression trees that grew out of this idea.