I am planning to use repeated (10 times) stratified 10-fold cross validation on about 10,000 cases using machine learning algorithm. Each time the repetition will be done with different random seed.
In this process I create 10 instances of probability estimates for each case.
1 instance of probability estimate for in each of the 10 repetitions of the 10-fold cross validation
Can I average 10 probabilities for each case and then create a new average ROC curve (representing results of repeated 10-fold CV), which can be compared to other ROC curves by paired comparisons ?
Best Answer
From your description it seems to make perfect sense: not only you may calculate the mean ROC curve, but also the variance around it to build confidence intervals. It should give you the idea of how stable your model is.
For example, like this:
Here I put individual ROC curves as well as the mean curve and the confidence intervals. There are areas where curves agree, so we have less variance, and there are areas where they disagree.
For repeated CV you can just repeat it multiple times and get the total average across all individual folds:
It's quite similar to the previous picture, but gives more stable (i.e. reliable) estimates of the mean and variance.
Here's the code to get the plot:
For repeated CV:
Source of inspiration: http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc_crossval.html