Solved – Average ROC for repeated 10-fold cross validation with probability estimates

roc

I am planning to use repeated (10 times) stratified 10-fold cross validation on about 10,000 cases using machine learning algorithm. Each time the repetition will be done with different random seed.

In this process I create 10 instances of probability estimates for each case.
1 instance of probability estimate for in each of the 10 repetitions of the 10-fold cross validation

Can I average 10 probabilities for each case and then create a new average ROC curve (representing results of repeated 10-fold CV), which can be compared to other ROC curves by paired comparisons ?

Best Answer

From your description it seems to make perfect sense: not only you may calculate the mean ROC curve, but also the variance around it to build confidence intervals. It should give you the idea of how stable your model is.

For example, like this:

Here I put individual ROC curves as well as the mean curve and the confidence intervals. There are areas where curves agree, so we have less variance, and there are areas where they disagree.

For repeated CV you can just repeat it multiple times and get the total average across all individual folds:

It's quite similar to the previous picture, but gives more stable (i.e. reliable) estimates of the mean and variance.

Here's the code to get the plot:

import matplotlib.pyplot as plt
import numpy as np

from sklearn.datasets import make_classification
from sklearn.cross_validation import KFold
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_curve

X, y = make_classification(n_samples=500, random_state=100, flip_y=0.3)

kf = KFold(n=len(y), n_folds=10)

tprs = []
base_fpr = np.linspace(0, 1, 101)

plt.figure(figsize=(5, 5))
plt.axes().set_aspect('equal', 'datalim')

for i, (train, test) in enumerate(kf):
    model = LogisticRegression().fit(X[train], y[train])
    y_score = model.predict_proba(X[test])
    fpr, tpr, _ = roc_curve(y[test], y_score[:, 1])
    
    plt.plot(fpr, tpr, 'b', alpha=0.15)
    tpr = np.interp(base_fpr, fpr, tpr)
    tpr[0] = 0.0
    tprs.append(tpr)

tprs = np.array(tprs)
mean_tprs = tprs.mean(axis=0)
std = tprs.std(axis=0)

tprs_upper = np.minimum(mean_tprs + std, 1)
tprs_lower = mean_tprs - std


plt.plot(base_fpr, mean_tprs, 'b')
plt.fill_between(base_fpr, tprs_lower, tprs_upper, color='grey', alpha=0.3)

plt.plot([0, 1], [0, 1],'r--')
plt.xlim([-0.01, 1.01])
plt.ylim([-0.01, 1.01])
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.show()

For repeated CV:

idx = np.arange(0, len(y))

for j in np.random.randint(0, high=10000, size=10):
    np.random.shuffle(idx)
    kf = KFold(n=len(y), n_folds=10, random_state=j)

    for i, (train, test) in enumerate(kf):
        model = LogisticRegression().fit(X[idx][train], y[idx][train])
        y_score = model.predict_proba(X[idx][test])
        fpr, tpr, _ = roc_curve(y[idx][test], y_score[:, 1])

        plt.plot(fpr, tpr, 'b', alpha=0.05)
        tpr = interp(base_fpr, fpr, tpr)
        tpr[0] = 0.0
        tprs.append(tpr)

Source of inspiration: http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc_crossval.html

Related Solutions

Solved – How to generate ROC curves for leave-one-out cross validation

If the classifier outputs probabilities, then combining all the test point outputs for a single ROC curve is appropriate. If not, then scale the output of the classifier in a manner that would make it directly comparable across classifiers. For example, say you are using Linear Discriminant Analysis. Train the classifier and then put the training data through the classifier. Learn two weights: a scale parameter $\sigma$ (the standard deviation of the classifier outputs, after subtracting the class means), and a shift parameter $\mu$ (the mean of the first class). Use these parameters to normalize the raw $r$ output of each LDA classifier via $n = (r-\mu)/\sigma$, and then you can create an ROC curve from the set of normalized outputs. This has the caveat that you are estimating more parameters, and thus the results may deviate slightly than if you'd constructed an ROC curve based on a separate test set.

If it is not possible to normalize classifier outputs or transform them to probabilities, then a ROC analysis based on LOO-CV is not appropriate.

Solved – Confidence Intervals for AUC using cross-validation

Here is a sample of how you would do it in python.

from sklearn import cross_validation
scores = cross_validation.cross_val_score(your_model, your_data, y, cv=10)
mean_score = scores.mean()
std_dev = scores.std()
std_error = scores.std() / math.sqrt(scores.shape[0])
ci =  2.262 * std_error
lower_bound = mean_score - ci
upper_bound = mean_score + ci

print "Score is %f +/-  %f" % (mean_score, ci)
print '95 percent probability that if this experiment were repeated over and    
over the average score would be between %f and %f' % (lower_bound, upper_bound)

Best Answer

Related Solutions

Solved – How to generate ROC curves for leave-one-out cross validation

Solved – Confidence Intervals for AUC using cross-validation

Related Question