Solved – Practical questions on tuning Random Forests

cartrandom forest

My questions are about Random Forests. The concept of this beautiful classifier is clear to me, but still there are a lot of practical usage questions. Unfortunately, I failed to find any practical guide to RF (I've been searching for something like "A Practical Guide for Training Restricted Boltzman Machines" by Geoffrey Hinton, but for Random Forests!

How can one tune RF in practice?

Is it true that bigger number of trees is always better? Is there a reasonable limit (except comp. capacity of course) on increasing number of trees and how to estimate it for given dataset?

What about depth of the trees? How to choose the reasonable one? Is there a sense in experimenting with trees of different length in one forest and what is the guidance for that?

Are there any other parameters worth looking at when training RF? Algos for building individual trees may be?

When they say RF are resistant to overfitting, how true is that?

I'll appreciate any answers and/or links to guides or articles that I might have missed while my search.

Best Answer

I'm not an authoritative figure, so consider these brief practitioner notes:

More trees is always better with diminishing returns. Deeper trees are almost always better subject to requiring more trees for similar performance.

The above two points are directly a result of the bias-variance tradeoff. Deeper trees reduces the bias; more trees reduces the variance.

The most important hyper-parameter is how many features to test for each split. The more useless features there are, the more features you should try. This needs tuned. You can sort of tune it via OOB estimates if you just want to know your performance on your training data and there is no twinning (~repeated measures). Even though this is the most important parameter, it's optimum is still usually fairly close to the original suggest defaults (sqrt(p) or (p/3) for classification/regression).

Fairly recent research shows you don't even need to do exhaustive split searches inside a feature to get good performance. Just try a few cut points for each selected feature and move on. This makes training even faster. (~Extremely Random Forests/Trees).

Related Question