Solved – ROC curves and AUC in simulations to compare models

aucroc

I am using ROC curves to compare different methods but not sure if I need to re-simulate datasets using different seeds in R in order to reduce the "by-chance" issue for a particular output. Here is a brief outline of my simulation:

  1. The function generate.data is used to simulate data of some distribution, and by simulation, I know which data are true positives. The random number generator is controlled by fixing the seed in R.

  2. The function check.models is used to test a total of 5 methods, and return the quantities used to draw a ROC curve for each method. Also for each curve (method), the AUC is reported.

  3. The function plot.roc is used for plotting.

In step #1, there are some other factors to change so that the data are under different "alternatives". When I run steps #1 and #2 above using seed=123 and pick up the method with the highest AUC, I got one set of results. However, when I re-run using a different seed (say seed=456), I got another set of results not identical to the first run. Therefore, I think rigorously I should run my simulation across different seed's in R to generate data in step #1, so that the "by-chance" issue of using a particular dataset is reduced.

Am I correct? If so, then I should report the average of the AUC's for each method across (say, 1000) simulations, and pick up the highest among the methods compared? Thanks!

Best Answer

Since you are using the ROC, I presume that you are running 5 classifiers. Frank is right about the ROC, that's not the way people compare models. For the linear, and generalized linear models you can apply the likelihood ratio test.

However, in case you are after the best prediction performance, and particularly in case you are not using a parametric model, but say a random forest classifier, I would do the following:

  • generate data
  • split it randomly into a training and testing set
  • train all your 5 models and test their performance
  • repeat the entire procedure for as many time as the run time permits and store all 5 ROC curves (I would pick a 1000, or 10000 as a minimum, depending on the convergence of the mean predictions)
  • report the means of the 5 ROC curves together with a 90% pointwise confidence interval around them

The idea is of course that you pick a model that seems like the best combination of high AUC and low variance (narrow intervals around the mean) of the estimates.

Best