I am implementing random forest in my code, I find that when I build one tree, the oob error(mean square error since I do regression) is close to zero, while more tree is build, the oob error stablized, this is counter-intuitive, since the textbook teaches me that the oob error should decline as more trees builds, I compare my implementation with R, the my oob is a slightly less than that of R, when mtree=1000, but when mtree=1, my oob is close to zero, while R is quite big.
Simply put, my oob increase as more trees build and stabilized. R and text book shows that oob decrease as more trees build and is stabilized.
So, is there anything wrong with my implementation? how should I tune my algorithm?
Best Answer
Hastie et al. address this question very briefly in Elements of Statistical Learning (page 596).
Stated another way, for a fixed hyperparameter configuration, increasing the number of trees cannot overfit the data; however, the other hyperparameters might be a source of overfit.