Solved – How to determine best cutoff point and its confidence interval using ROC curve in R

confidence intervaldata visualizationggplot2rroc

I have the data of a test that could be used to distinguish normal and tumor cells. According to ROC curve it looks good for this purpose (area under curve is 0.9):

ROC curve

My questions are:

  1. How to determine cutoff point for this test and its confidence interval where readings should be judged as ambiguous?
  2. What is the best way to visualize this (using ggplot2)?

Graph is rendered using ROCR and ggplot2 packages:

#install.packages("ggplot2","ROCR","verification") #if not installed yet
library("ggplot2")
library("ROCR")
library("verification")
d <-read.csv2("data.csv", sep=";")
pred <- with(d,prediction(x,test))
perf <- performance(pred,"tpr", "fpr")
auc <-performance(pred, measure = "auc")@y.values[[1]]
rd <- data.frame(x=perf@x.values[[1]],y=perf@y.values[[1]])
p <- ggplot(rd,aes(x=x,y=y)) + geom_path(size=1)
p <- p + geom_segment(aes(x=0,y=0,xend=1,yend=1),colour="black",linetype= 2)
p <- p + geom_text(aes(x=1, y= 0, hjust=1, vjust=0, label=paste(sep = "", "AUC = ",round(auc,3) )),colour="black",size=4)
p <- p + scale_x_continuous(name= "False positive rate")
p <- p + scale_y_continuous(name= "True positive rate")
p <- p + opts(
            axis.text.x = theme_text(size = 10),
            axis.text.y = theme_text(size = 10),
            axis.title.x = theme_text(size = 12,face = "italic"),
            axis.title.y = theme_text(size = 12,face = "italic",angle=90),
            legend.position = "none",
            legend.title = theme_blank(),
            panel.background = theme_blank(),
            panel.grid.minor = theme_blank(), 
            panel.grid.major = theme_line(colour='grey'),
            plot.background = theme_blank()
            )
p

data.csv contains the following data:

x;group;order;test
56;Tumor;1;1
55;Tumor;1;1
52;Tumor;1;1
60;Tumor;1;1
54;Tumor;1;1
43;Tumor;1;1
52;Tumor;1;1
57;Tumor;1;1
50;Tumor;1;1
34;Tumor;1;1
24;Normal;2;0
34;Normal;2;0
22;Normal;2;0
32;Normal;2;0
25;Normal;2;0
23;Normal;2;0
23;Normal;2;0
19;Normal;2;0
56;Normal;2;0
44;Normal;2;0

Best Answer

Thanks to all who aswered this question. I agree that there could be no one correct answer and criteria greatly depend on the aims that stand behind of the certain diagnostic test.

Finally I had found an R package OptimalCutpoints dedicated exactly to finding cutoff point in such type of analysis. Actually there are several methods of determining cutoff point.

  • "CB" (cost-benefit method);
  • "MCT" (minimizes Misclassification Cost Term);
  • "MinValueSp" (a minimum value set for Specificity);
  • "MinValueSe" (a minimum value set for Sensitivity);
  • "RangeSp" (a range of values set for Specificity);
  • "RangeSe" (a range of values set for Sensitivity);
  • "ValueSp" (a value set for Specificity);
  • "ValueSe" (a value set for Sensitivity);
  • "MinValueSpSe" (a minimum value set for Specificity and Sensitivity);
  • "MaxSp" (maximizes Specificity);
  • "MaxSe" (maximizes Sensitivity);
  • "MaxSpSe" (maximizes Sensitivity and Specificity simultaneously);
  • "Max-SumSpSe" (maximizes the sum of Sensitivity and Specificity);
  • "MaxProdSpSe" (maximizes the product of Sensitivity and Specificity);
  • "ROC01" (minimizes distance between ROC plot and point (0,1));
  • "SpEqualSe" (Sensitivity = Specificity);
  • "Youden" (Youden Index);
  • "MaxEfficiency" (maximizes Efficiency or Accuracy);
  • "Minimax" (minimizes the most frequent error);
  • "AUC" (maximizes concordance which is a function of AUC);
  • "MaxDOR" (maximizes Diagnostic Odds Ratio);
  • "MaxKappa" (maximizes Kappa Index);
  • "MaxAccuracyArea" (maximizes Accuracy Area);
  • "MinErrorRate" (minimizes Error Rate);
  • "MinValueNPV" (a minimum value set for Negative Predictive Value);
  • "MinValuePPV" (a minimum value set for Positive Predictive Value);
  • "MinValueNPVPPV" (a minimum value set for Predictive Values);
  • "PROC01" (minimizes distance between PROC plot and point (0,1));
  • "NPVEqualPPV" (Negative Predictive Value = Positive Predictive Value);
  • "ValueDLR.Negative" (a value set for Negative Diagnostic Likelihood Ratio);
  • "ValueDLR.Positive" (a value set for Positive Diagnostic Likelihood Ratio);
  • "MinPvalue" (minimizes p-value associated with the statistical Chi-squared test which measures the association between the marker and the binary result obtained on using the cutpoint);
  • "ObservedPrev" (The closest value to observed prevalence);
  • "MeanPrev" (The closest value to the mean of the diagnostic test values);
  • "PrevalenceMatching" (The value for which predicted prevalence is practically equal to observed prevalence).

So now the task is narrowed to selecting the method that is the best match for each situation.

There are many other configuration options described in package documentation including several methods of determining confidence intervals and detailed description of each of the methods.