In other words, instead of having a two class problem I am dealing with 4 classes and still would like to assess performance using AUC.

# Solved – How to plot ROC curves in multiclass classification

classificationroc

#### Related Solutions

Are you sure you need a distinct ROC curve per patient? What exactly are you going to do with the AUC measures of which you have one per patient?

If you want to asses your classifier's performance, you could also randomly group your datapoints into 10 folds. The patient number would be an additional column in your data-set which your classifier can use. When you repeat that cross validation 3 times, you have 30 samples of the classifiers performance.

If the goal is to classify datapoints from unknown future patients (starting from their first datapoint), you should disregard patient numbers alltogether.

Edit: One option is Cohen's Kappa. It takes care of the no information rate. It is defined when there are no TP. Is is however undefined when the classification is perfect which may create another problem with small test-sets.

If you should know the concrete misclassification costs of FP versus FN (and perhaps the profits of TP and TN), then you should always use those as your performance metric.

You can always use macro averages of you ROC measurements or of F-measures. Usually, you would take each of both classes once as positive and once as negative and average the two performance scores. You could also give more weight to the rare class and compute a weighted macro average. This is quite unusual, but so is the approach of having this many performance measures which are all based on one patient. (Forman 2010)

I know the question is two years old and the technical answer was given in the comments, but a more elaborate answer might help others still struggling with the concepts.

OP's ROC curve wrong because he used the predicted values of his models instead of the probabilities.

**What does this mean?**

When a model is trained it learns the relationships between the input variables and the output variable. For each observation the model is shown, the model learns how probable it is that a given observation belongs to a certain class. When the model is presented with the test data it will guess for each unseen observation how probable it is to belong to a given class.

**How does the model know if an observation belongs to a class?**
During testing the model receives an observation for which it estimates a probability of 51% of belonging to Class X. How does take the decision to label as belonging to Class X or not? The researcher will set a threshold telling the model that all observations with a probability under 50% must be classified as Y and all those above must be classified as X. Sometimes the researcher wants to set a stricter rule because they're more interested in correctly predicting a given class like X rather than trying to predict all of them as well.

So you trained model has estimated a probability for each of your observations, but the threshold will ultimately decide to in which class your observation will be categorized.

**Why does this matter?**

The curve created by the ROC plots a point for each of the True positive rate and false positive rate of your model at different threshold levels. This helps the researcher to see the trade-off between the FPR and TPR for all threshold levels.

So when you pass the predicted values instead of the predicted probabilities to your ROC you will only have one point because these values were calculated using one specific threshold. Because that point is the TPR and FPR of your model for one specific threshold level.

What you need to do is use the probabilities instead and let the threshold vary.

Run your model as such:

```
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier()
knn_model = knn.fit(X_train,y_train)
#Use the values for your confusion matrix
knn_y_model = knn_model.predict(X=X_test)
# Use the probabilities for your ROC and Precision-recall curves
knn_y_proba = knn_model.predict_proba(X=X_test)
```

When creating your confusion matrix you will use the values of your model

```
from mlxtend.plotting import plot_confusion_matrix
fig, ax = plot_confusion_matrix(conf_mat=confusion_matrix(y_test,knn_y_model),
show_absolute=True,show_normed=True,colorbar=True)
plt.title("Confusion matrix - KNN")
plt.ylabel('True label')
plt.xlabel('Predicted label'
```

When creating your ROC curve you will use the probabilities

```
import scikitplot as skplt
plot = skplt.metrics.plot_roc(y_test, knn_y_proba)
plt.title("ROC Curves - K-Nearest Neighbors")
```

## Best Answer

It seems you are looking for multi-class ROC analysis, which is a kind of multi-objective optimization covered in a tutorial at ICML'04. As in several multi-class problem, the idea is generally to carry out pairwise comparison (one class vs. all other classes, one class vs. another class, see (1) or the

Elements of Statistical Learning), and there is a recent paper by Landgrebe and Duin on that topic, Approximating the multiclass ROC by pairwise analysis, Pattern Recognition Letters 2007 28: 1747-1758. Now, for visualization purpose, I've seen some papers some time ago, most of them turning around volume under the ROC surface (VUS) or Cobweb diagram.I don't know, however, if there exists an R implementation of these methods, although I think the

`stars()`

function might be used for cobweb plot. I just ran across a Matlab toolbox that seems to offer multi-class ROC analysis, PRSD Studio.Other papers that may also be useful as a first start for visualization/computation:

References:1. Allwein, E.L., Schapire, R.E. and Singer, Y. (2000). Reducing multiclass to binary: A unifying approach for margin classifiers.

Journal of Machine Learning Research,1:113–141.