Solved – Comparing CV Predictions across Folds for Random Forest

cross-validationrandom forest

I have implemented a k-fold cross validation to to assess the classification performance of a Random Forest. What I want to know is: are the predicted values across folds directly comparable?

For example, when I generate predictions on holdout fold 1 and get a predicted value of 0.84 for one observation, can I be more confident in that prediction than a value of 0.80 for an observation in fold 2?

The ultimate question is if it would be appropriate to stack all of the predictions for my k-folds and then calculate model performance (such as ROC) from the stacked predictions. This could be useful in the case of highly imbalanced datasets with a low number of positives, as each fold will have an even lower number of positives and thus the ROC will have a relatively high variance across folds.

This post on RF was helpful, but does not directly address this question.

Addtional Info: I'm pariticularly interested in cases with high class imbalances and small positive sets. This doesn't change the question, but does highlight the potential issues with the comparing of results across folds.

Best Answer

For each fold, you are building a classifier that makes predictions for the observations. The classifiers within each fold have slightly different training sets and different weights, but they are all attempting to estimate the same underlying model. So yes, you can combine the predictions. If you have multiple predictions for one observation, you could take the average prediction of several folds, or weight the predictions so that the more accurate models have more influence than less accurate ones. This applies to any "ensemble learning" system. Predictions for different observations should be made on the same scale (e.g. from -1 to +1 or 0 to +1) so I can't think of any reason not to combine them.

Best Answer

Related Solutions

Classification – Comparing Mean Scores vs. Score Concatenation in Cross Validation

Solved – How to evaluate stacking ensemble model vs. other models with 10-fold cross-validation

Related Question