Solved – Averaging ROC curves over folds in cross-validation

rocstatistical significance

I have data from 10-fold cross-validation experiment: for each fold I have a predictor and a response variable so I can generate ROC curve and compute area under the ROC curve.

I have a series of three such experiments, so in general I can generate 30 ROC curves. I wonder if anybody has an idea how to average ROC curves over 10 folds for each experiment and then test if differences between three averaged ROC curves are statistically significant.

Best Answer

There is a difference between averaging over AUC and over the curves. Also if you want to do it over the curves then there are a few ways to do it. If you are interested in rate constrained tasks (For example information retrieval with a probability distribution over a time limit, or say classifying customers for a call centre to target in a given time) Then I would recommend this method:

Millard, Louise AC, Meelis Kull, and Peter A. Flach. "Rate-Oriented Point-Wise Confidence Bounds for ROC Curves." Machine Learning and Knowledge Discovery in Databases. Springer Berlin Heidelberg, 2014. 404-421.

https://drive.google.com/file/d/0BzEymYqJrJmhNEdGZWlzaV91d1k/view?usp=sharing