Solved – Average ROC for repeated 10-fold cross validation with probability estimates

roc

I am planning to use repeated (10 times) stratified 10-fold cross validation on about 10,000 cases using machine learning algorithm. Each time the repetition will be done with different random seed.

In this process I create 10 instances of probability estimates for each case.
1 instance of probability estimate for in each of the 10 repetitions of the 10-fold cross validation

Can I average 10 probabilities for each case and then create a new average ROC curve (representing results of repeated 10-fold CV), which can be compared to other ROC curves by paired comparisons ?

Best Answer

From your description it seems to make perfect sense: not only you may calculate the mean ROC curve, but also the variance around it to build confidence intervals. It should give you the idea of how stable your model is.

For example, like this:

enter image description here

Here I put individual ROC curves as well as the mean curve and the confidence intervals. There are areas where curves agree, so we have less variance, and there are areas where they disagree.

For repeated CV you can just repeat it multiple times and get the total average across all individual folds:

enter image description here

It's quite similar to the previous picture, but gives more stable (i.e. reliable) estimates of the mean and variance.

Here's the code to get the plot:

import matplotlib.pyplot as plt
import numpy as np

from sklearn.datasets import make_classification
from sklearn.cross_validation import KFold
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_curve

X, y = make_classification(n_samples=500, random_state=100, flip_y=0.3)

kf = KFold(n=len(y), n_folds=10)

tprs = []
base_fpr = np.linspace(0, 1, 101)

plt.figure(figsize=(5, 5))
plt.axes().set_aspect('equal', 'datalim')

for i, (train, test) in enumerate(kf):
    model = LogisticRegression().fit(X[train], y[train])
    y_score = model.predict_proba(X[test])
    fpr, tpr, _ = roc_curve(y[test], y_score[:, 1])
    
    plt.plot(fpr, tpr, 'b', alpha=0.15)
    tpr = np.interp(base_fpr, fpr, tpr)
    tpr[0] = 0.0
    tprs.append(tpr)

tprs = np.array(tprs)
mean_tprs = tprs.mean(axis=0)
std = tprs.std(axis=0)

tprs_upper = np.minimum(mean_tprs + std, 1)
tprs_lower = mean_tprs - std


plt.plot(base_fpr, mean_tprs, 'b')
plt.fill_between(base_fpr, tprs_lower, tprs_upper, color='grey', alpha=0.3)

plt.plot([0, 1], [0, 1],'r--')
plt.xlim([-0.01, 1.01])
plt.ylim([-0.01, 1.01])
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.show()

For repeated CV:

idx = np.arange(0, len(y))

for j in np.random.randint(0, high=10000, size=10):
    np.random.shuffle(idx)
    kf = KFold(n=len(y), n_folds=10, random_state=j)

    for i, (train, test) in enumerate(kf):
        model = LogisticRegression().fit(X[idx][train], y[idx][train])
        y_score = model.predict_proba(X[idx][test])
        fpr, tpr, _ = roc_curve(y[idx][test], y_score[:, 1])

        plt.plot(fpr, tpr, 'b', alpha=0.05)
        tpr = interp(base_fpr, fpr, tpr)
        tpr[0] = 0.0
        tprs.append(tpr)

Source of inspiration: http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc_crossval.html