I am using ROC curves to compare different methods but not sure if I need to re-simulate datasets using different seeds in R in order to reduce the "by-chance" issue for a particular output. Here is a brief outline of my simulation:
-
The function
generate.data
is used to simulate data of some distribution, and by simulation, I know which data are true positives. The random number generator is controlled by fixing theseed
in R. -
The function
check.models
is used to test a total of 5 methods, and return the quantities used to draw a ROC curve for each method. Also for each curve (method), the AUC is reported. -
The function
plot.roc
is used for plotting.
In step #1, there are some other factors to change so that the data are under different "alternatives". When I run steps #1 and #2 above using seed=123
and pick up the method with the highest AUC, I got one set of results. However, when I re-run using a different seed (say seed=456
), I got another set of results not identical to the first run. Therefore, I think rigorously I should run my simulation across different seed
's in R to generate data in step #1, so that the "by-chance" issue of using a particular dataset is reduced.
Am I correct? If so, then I should report the average of the AUC's for each method across (say, 1000) simulations, and pick up the highest among the methods compared? Thanks!
Best Answer
Since you are using the ROC, I presume that you are running 5 classifiers. Frank is right about the ROC, that's not the way people compare models. For the linear, and generalized linear models you can apply the likelihood ratio test.
However, in case you are after the best prediction performance, and particularly in case you are not using a parametric model, but say a random forest classifier, I would do the following:
The idea is of course that you pick a model that seems like the best combination of high AUC and low variance (narrow intervals around the mean) of the estimates.
Best