Solved – Low accuracy in random forest

random forest

I have a data set wherein there are 10 predictors (both continuous and categorical) while the dependent variable is a factor with levels 0 or 1. The event rate in my data (% of actual 1s) is 10%. However, when I apply random forest, it classifies only 5% observations as 1 and the rest 95% as 0. Why would this happen? Is it only related to the kind of variables I have and the transformations I have done or is it something that can be controlled by tuning parameters of the model?

Best Answer

It is quite a common issue when dealing with unbalanced datasets. Try using under or oversampling and/or choose a different performance measure for training (ie. ROC AUC).

Related Question