# Solved – How to reduce error rate of Random Forest in R

predictionrrandom forest

I want to build a prediction model on a dataset with ~1.6M rows and with the following structure:

And here is my code to make a random forest out of it:

fitFactor = randomForest(as.factor(classLabel)~.,data=d,ntree=300, importance=TRUE)


and summary of my data:

  fromCluster       start_day        start_time        gender       age          classLabel
Min.   : 1.000   Min.   :0.0000   Min.   :0.000   Min.   :1   Min.   :0.000   Min.   : 1.000
1st Qu.: 4.000   1st Qu.:1.0000   1st Qu.:1.000   1st Qu.:1   1st Qu.:0.000   1st Qu.: 4.000
Median : 6.000   Median :1.0000   Median :3.000   Median :1   Median :1.000   Median : 6.000
Mean   : 6.544   Mean   :0.7979   Mean   :2.485   Mean   :1   Mean   :1.183   Mean   : 6.537
3rd Qu.:10.000   3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:1   3rd Qu.:2.000   3rd Qu.:10.000
Max.   :10.000   Max.   :1.0000   Max.   :6.000   Max.   :1   Max.   :6.000   Max.   :10.000


But I don't understand why my error rate is so high!

What am I doing wrong?

The hyperparameters that you may tune include ntree, mtry and tree depth (either maxnodes or nodesize or both). By far, the most important is mtry. The default mtry for $p$ features is $\sqrt{p}$. Increasing mtry may improve performance. I recommend trying a grid over the range $\sqrt{p}/2$ to $3\sqrt{p}$ by increments of $\sqrt{p}/2$.
Tuning ntree is basically an exercise in selecting a large enough number of trees so that the error rate stabilizes. Because each tree is i.i.d., you can just train a large number of trees and pick the smallest $n$ such that the OOB error rate is basically flat.
By default, randomForest will build trees with a minimum node size of 1. This can be computationally expensive for many observations. Tuning node size/tree depth might be useful for you, if only to reduce training time. In Elements of Statistical Learning, the authors write that they have only observed modest gains in performance to be had by tuning trees in this way.