Recently I'm working with random forest algorithms, due to their easy to use. I always devide my set into train and test subsets, usually out of bag error for forest build on train dataset is higher (by more then 10%) then on test dataset. I wonder if it implicate overfitting or is it natural, should those two errors be equal ? If so I think I should choose parameters of forest (like maximum depth or maximum number of observations in termial node) to obtain similar values of errors.
Solved – Out-of-bag error and error on test dataset for random forest
overfittingrandom forest
Best Answer
I understand your question to be (correct me if I'm wrong) that:
The following points are worth noting: