I'm trying to make decisions regarding Random forest parameters for classification. My dataset contains 26 features and 6300 instances. How can I decide the values of (the number of trees, number of features, number of the bootstrap sample and the minimum node size) is there any formulas I should calculate help to decide about these values?
Thank you
Best Answer
Read the first RF article for implementation of RF in the R-project ("R-News"). I commonly use 500 to 5000 trees, but Breiman recommend using more in his original Machine Learning paper (he said: "Don't be stingy").
I have written the following for my own codes:
Number of features used for training at each node split,
jtry
. A unique characteristic of RF is that only a small number of features are randomly selected for training each tree. The parameter used to specify the number of features used isjtry
, and a typical value isjtry
=$\sqrt{p}$, where $p$ is the total number of features. Small values ofjtry
of 1 and 2 have shown to yield high performance values. If there are a large number of features to filter, then larger values ofjtry
will be required. During tree node splitting,jtry
features are randomly selected from the $p$ features, and used to create node splits.Number of trees,
ntree
. The number of trees used is specified byntree
. It has been shown that as few as 50 trees can result in reliable results (Cutler et al, 2007), but the majority of applications commonly use 500, 1000, 2000, or more trees. Breiman has recommended using at least 1000 trees (Breiman, 2001); however, it is common to use 500 trees, unless of course more trees results in greater performance.Number of objects in terminal nodes,
nodesize
. The parameternodesize
controls the size of terminals nodes during node splitting while training a tree. Nodes with fewer thannodesize
objects are not split, and therefore become terminal nodes. The ability of a tree to generate terminal nodes with class purity depends on the quality of thejtry
features randomly selected for making each split as well as the value ofnodesize
. Ifnodesize
$=1$, however, then every terminal node will have class purity.Number of total nodes,
nnode
. The number of total nodes grown in a tree can also be limited by use of the parameternnode
. Features that are less informative for splitting objects according to class labels will result in trees with greater size, and therefore setting the upper bound to 500 or 1000 is certainly not unreasonable if there are hundreds of samples.L. Breiman. Random Forests. Machine Learning, 45:5-32, 2001.
D.R. Cutler, T.C. Edwards, Jr., K.H. Beard, A. Cutler, K.T. Hess, J. Gibson, J.J. Lawler. Random forests for classification in ecology. Ecology, 88(11):2783-2792, 2007.