Solved – R caret ROC optimal cut-off in original values

caretrroc

I am new to using R-project, I started using the programming language due to the ease of cross-validation package caret. However, I'm stuck at translating the predicted probability values into the original practical values.

Description:
We have pedal power data from 45 unilateral vascular patients and 25 healthy participants and we want to predict the cut-off point at which people have vascular problems. I've used a 10-repeated 10-fold cross validation ROC to get the optimal (young method) cut-off point with general linear models.

The dataset is called Dia_data, as an example I'm using the variabele DIFF_POWER_MEAN_95_REL

library(caret)
library(pROC)

#train model
data_ctrl <- trainControl(method="repeatedcv", repeats= 10,     savePredictions=TRUE, classProbs=TRUE, number=10, p=0.8, summaryFunction = twoClassSummary, returnResamp = "all")` 
model <- train(patient ~ DIFF_POWER_MEAN_95_REL, data=Dia_data, method="glm",         trControl=data_ctrl, metric = "ROC", na.action = na.pass)`

#generate ROC-curve
ROC <- plot.roc(model$pred$obs, model$pred$N)

#optimal cut-off point using youden method
coords(ROC, "b", ret="t", best.method="youden")

Problem:
The ROC-curves looks fine, he result I get is a cut-off point of 0.273, but this value doesn't make sense (see plot). This cut-off point is based on the prediction probabilities due to resampling.
If I would make an ROC curve in SPSS for example (without cross-validation) the cut-off point is around 5 watts. This means that a pedal power difference between both legs of 5 watts is a predictor for vascular problems.
What I would like to have is the cross-validated cut-off value in original pedal power values.

Question:
How do I acquire the cut-off point values in original pedal power values?

Thus, the pedal power difference at which people have vascular problems?

Many thanks in advance!

ROC-curve showing the predicted value of the pedal power difference between both legs with optimal cut-off point

Best Answer

I think you confuse the prediction cut-off values (here: patient) with the cut-off value for your x-variable (here: pedal power).

If you take a look a the final model:

    summary(model$finalModel)

In your output you will see the coefficients for the intercept and pedal power variable. You want for example a prediction of above 0.8 to be sure someone is a patient. Imagine coefficients for intercept and DIFF_POWER_MEAN_95_REL are: -1.1015 and 0.3900.

               prediction = -1.1015 + DIFF_POWER_MEAN_95_REL * 0.3900
                      0.8 = -1.1015 + DIFF_POWER_MEAN_95_REL * 0.3900
   DIFF_POWER_MEAN_95_REL = (0.8 + 1.1015) / 0.3900
   DIFF_POWER_MEAN_95_REL = 4.8756

Your next challenge will be to decide on the cut-off for your prediction. You can do this for example by looking at the ratio false negatives / false positives (confusion matrix).

Related Question