Let say we have a SVM classifier, how do we generate ROC curve? (Like theoretically) (because we are generate TPR and FPR with each of the threshold). And how do we determine the optimal threshold for this SVM classifier?
Machine Learning – How to Determine the Optimal Threshold for a Classifier and Generate ROC Curve
machine learningrocsvm
Related Solutions
As I see it, the possibility to refuse classification as "too uncertain" is the whole point of choosing a threshold (as opposed to assigning the class with highest predicted probability).
Of course, you should have some justification for putting the threshold to 0.5: you may also put it up to 0.9 or any other value that is reasonable.
You describe a setup with mutually exclusive classes (closed-world problem). "No class reaches the threshold" can always happen as soon as that threshold is higher than 1/$n_{classes}$, i.e. the same problem occurs in a 2-class problem with threshold, say, 0.9. For threshold = 1/$n_{classes}$ it could happen in theory, but in practice it is highly unlikely.
So your problem is not related (just more pronounced) to the 3-class set-up.
To your second question: you can compute ROC curves for any kind of continuous output scores, they don't even need to claim that they are probabilities. Personally, I don't calibrate, because I don't want to waste another test set on that (I work with very restricted sample sizes). The shape of the ROC anyways won't change.
Answer to your comment: The ROC conceptually belongs to a set-up that in my field is called single-class classification: does a patient have a particular disease or not. From that point of view, you can assign a 10% probability that the patient does have the disease. But this does not imply that with 90% probability he has something defined - the complementary 90% actually belong to a "dummy" class: not having that disease. For some diseases & tests, finding everyone may be so important that you set your working point at a threshold of 0.1. Textbook example where you choose an extreme working point is HIV test in blood donations.
So for constucting the ROC for class A (you'd say: the patient is A positive), you look at class A posterior probabilities only. For binary classification with probability (not A) = 1 - probability (A), you don't need to plot the second ROC as it does not contain any information that is not readily accessible from the first one.
In your 3 class set up you can plot a ROC for each class. Depending on how you choose your threshold, no classification, exactly one class, or more than one class assigned can result. What is sensible depends on your problem. E.g. if the classes are "Hepatitis", "HIV", and "broken arm", then this policy is appropriate as a patient may have none or all of these.
The Accuracy (or the area under the ROC curve) depends on the sample used to construct the ROC curve (see, e.g., How to interpret 95% confidence interval for Area Under Curve of ROC?). So if you use this ROC to optimise the threshold there is a risk of overfitting to the sample.
You can indeed "calibrate" a threshold using a hold-out data set. You might also use k-fold cross validation. The link with the ROC-curve is the fact that that there exists a link between the area under the ROC-curve and the accuracy ratio ($AR=2AUC-1$).
The Area under the ROC (AUROC or AUC) is linked to the accuracy ratio. The area under the curve can be interpreted as follows: of all possible pairs with one subject of class 1 and one subject of class 2, there is a fraction equal to AUC for which the subject in class 1 will have a better score.
This interpretation is the same when you use a random forest.
You can find more on all this in, e.g., this paper
Best Answer
Use the SVM classifier to classify a set of annotated examples, and "one point" on the ROC space based on one prediction of the examples can be identified. Suppose the number of examples is 200, first count the number of examples of the four cases.
\begin{array} {|r|r|r|} \hline & \text{labeled true} & \text{labeled false} \\ \hline \text{predicted true} &71& 28\\ \hline \text{predicted false} &57&44 \\ \hline \end{array}
Then compute TPR (True Positive Rate) and FPR (False Positive Rate). $TPR = 71/ (71+57)=0.5547$, and $FPR=28/(28+44) = 0.3889$ On the ROC space, the x-axis is FPR, and the y-axis is TPR. So point $(0.3889, 0.5547)$ is obtained.
To draw an ROC curve, just
For example, if concentration of certain protein above α% signifies a disease, different values of α yield different final TPR and FPR values. The threshold values can be simply determined in a way similar to grid search; label training examples with different threshold values, train classifiers with different sets of labelled examples, run the classifier on the test data, compute FPR values, and select the threshold values that cover low (close to 0) and high (close to 1) FPR values, i.e., close to 0, 0.05, 0.1, ..., 0.95, 1
Some details can be checked in http://en.wikipedia.org/wiki/Receiver_operating_characteristic.
Besides, these two links are useful about how to determine an optimal threshold. A simple method is to take the one with maximal sum of true positive and false negative rates. Other finer criteria may include other variables involving different thresholds like financial costs, etc.
http://www.medicalbiostatistics.com/roccurve.pdf
http://www.kovcomp.co.uk/support/XL-Tut/life-ROC-curves-receiver-operating-characteristic.html