Solved – ROC curve for discrete classifiers like SVM: Why do we still call it a “curve”?, Isn’t it just a “point”

auccross-validationroc

In the discussion : how to generate a roc curve for binary classification, I think that the confusion was that a "binary classifier" (which is any classifier that separates 2 classes) was for Yang what is called a "discrete classifier" (which produces discrete outputs 0/1 like an SVM) and not continuous outputs like ANN or Bayes classifiers … etc. So, the discussion was about how the ROC is plotted for "binary continuous classifiers", and the answer is that the outputs are sorted by their scores since the outputs are continuous, and a threshold is used to produce each point on the ROC curve.

My question is for "binary discrete classifiers", such as SVM, the output values are 0 or 1. So the ROC produces just one point and not a curve. I'm confused as to why we still call it a curve?!! Can we still talk about thresholds? How can one use thresholds in SVM in particular? How can one compute the AUC?, Does cross-validation play any role here?

Best Answer

Yes, there are situations where the usual receiver operating curve cannot be obtained and only one point exists.
SVMs can be set up so that they output class membership probabilities. These would be the usual value for which a threshold would be varied to produce a receiver operating curve.
Is that what you are looking for?
Steps in the ROC usually happen with small numbers of test cases rather than having anything to do with discrete variation in the covariate (particularly, you end up with the same points if you choose your discrete thresholds so that for each new point only one sample changes its assignment).
Continuously varying other (hyper)parameters of the model of course produces sets of specificity/sensitivity pairs that give other curves in the FPR;TPR coordinate system.
The interpretation of a curve of course depends on what variation did generate the curve.

Here's a usual ROC (i.e. requesting probabilities as output) for the "versicolor" class of the iris data set:

FPR;TPR (γ = 1, C = 1, varying probability threshold):

The same type of coordinate system, but TPR and FPR as function of the tuning parameters γ and C:

FPR;TPR (varying γ, C = 1, probability threshold = 0.5):
FPR;TPR (γ = 1, varying C, probability threshold = 0.5):

These plots do have a meaning, but the meaning is decidedly different from that of the usual ROC!

Here's the R code I used:

svmperf <- function (cost = 1, gamma = 1) {
    model <- svm (Species ~ ., data = iris, probability=TRUE, 
                  cost = cost, gamma = gamma)
    pred <- predict (model, iris, probability=TRUE, decision.values=TRUE)
    prob.versicolor <- attr (pred, "probabilities")[, "versicolor"]

    roc.pred <- prediction (prob.versicolor, iris$Species == "versicolor")
    perf <- performance (roc.pred, "tpr", "fpr")

    data.frame (fpr = perf@x.values [[1]], tpr = perf@y.values [[1]], 
                threshold = perf@alpha.values [[1]], 
                cost = cost, gamma = gamma)
}

df <- data.frame ()
for (cost in -10:10)
  df <- rbind (df, svmperf (cost = 2^cost))
head (df)
plot (df$fpr, df$tpr)

cost.df <- split (df, df$cost)

cost.df <- sapply (cost.df, function (x) {
    i <- approx (x$threshold, seq (nrow (x)), 0.5, method="constant")$y 
    x [i,]
})

cost.df <- as.data.frame (t (cost.df))
plot (cost.df$fpr, cost.df$tpr, type = "l", xlim = 0:1, ylim = 0:1)
points (cost.df$fpr, cost.df$tpr, pch = 20, 
        col = rev(rainbow(nrow (cost.df),start=0, end=4/6)))

df <- data.frame ()
for (gamma in -10:10)
  df <- rbind (df, svmperf (gamma = 2^gamma))
head (df)
plot (df$fpr, df$tpr)

gamma.df <- split (df, df$gamma)

gamma.df <- sapply (gamma.df, function (x) {
     i <- approx (x$threshold, seq (nrow (x)), 0.5, method="constant")$y
     x [i,]
})

gamma.df <- as.data.frame (t (gamma.df))
plot (gamma.df$fpr, gamma.df$tpr, type = "l", xlim = 0:1, ylim = 0:1, lty = 2)
points (gamma.df$fpr, gamma.df$tpr, pch = 20, 
        col = rev(rainbow(nrow (gamma.df),start=0, end=4/6)))

roc.df <- subset (df, cost == 1 & gamma == 1)
plot (roc.df$fpr, roc.df$tpr, type = "l", xlim = 0:1, ylim = 0:1)
points (roc.df$fpr, roc.df$tpr, pch = 20, 
        col = rev(rainbow(nrow (roc.df),start=0, end=4/6)))

Best Answer

Related Solutions

Solved – Decision threshold for a 3-class Naive Bayes ROC curve

Solved – Did I just invent a Bayesian method for analysis of ROC curves

Related Question