Solved – Should OOB (Out Of Bag) error be less than a Test set error in Random Forests

accuracycartcross-validationrandom forest

I am using the book, "An introduction to statistical learning with applications in R" and reading the section on using OOB to estimate the model error for Random Forests. The graph seems to suggest that the OOB error will be a lot lower than the test-set error. However I cannot find any rationale for this. To the best of my understanding, it should be equal to the test error. Why are these 2 errors different?

enter image description here

Best Answer

To my knowledge, no.

There are more strange things in this plot, e.g. why does bagging outperform the random forest with respect to the OOB error? It's hard to explain the observed without more information on the data, e.g. how many samples were used in training and testing? How was training and testing performed?

If the model was trained and tested on only a small set of samples, the observed difference in error rate might be not significant. Further, if the problem has a rather steep learning curve and testing was performed by holding out a portion of the data while OOB error estimation was performed on the entire data-set, under-fitting might be another explanation.