Solved – R caret ROC optimal cut-off in original values


I am new to using R-project, I started using the programming language due to the ease of cross-validation package caret. However, I'm stuck at translating the predicted probability values into the original practical values.

We have pedal power data from 45 unilateral vascular patients and 25 healthy participants and we want to predict the cut-off point at which people have vascular problems. I've used a 10-repeated 10-fold cross validation ROC to get the optimal (young method) cut-off point with general linear models.

The dataset is called Dia_data, as an example I'm using the variabele DIFF_POWER_MEAN_95_REL


#train model
data_ctrl <- trainControl(method="repeatedcv", repeats= 10,     savePredictions=TRUE, classProbs=TRUE, number=10, p=0.8, summaryFunction = twoClassSummary, returnResamp = "all")` 
model <- train(patient ~ DIFF_POWER_MEAN_95_REL, data=Dia_data, method="glm",         trControl=data_ctrl, metric = "ROC", na.action = na.pass)`

#generate ROC-curve
ROC <- plot.roc(model$pred$obs, model$pred$N)

#optimal cut-off point using youden method
coords(ROC, "b", ret="t", best.method="youden")

The ROC-curves looks fine, he result I get is a cut-off point of 0.273, but this value doesn't make sense (see plot). This cut-off point is based on the prediction probabilities due to resampling.
If I would make an ROC curve in SPSS for example (without cross-validation) the cut-off point is around 5 watts. This means that a pedal power difference between both legs of 5 watts is a predictor for vascular problems.
What I would like to have is the cross-validated cut-off value in original pedal power values.

How do I acquire the cut-off point values in original pedal power values?

Thus, the pedal power difference at which people have vascular problems?

Many thanks in advance!

ROC-curve showing the predicted value of the pedal power difference between both legs with optimal cut-off point

Best Answer

I think you confuse the prediction cut-off values (here: patient) with the cut-off value for your x-variable (here: pedal power).

If you take a look a the final model:


In your output you will see the coefficients for the intercept and pedal power variable. You want for example a prediction of above 0.8 to be sure someone is a patient. Imagine coefficients for intercept and DIFF_POWER_MEAN_95_REL are: -1.1015 and 0.3900.

               prediction = -1.1015 + DIFF_POWER_MEAN_95_REL * 0.3900
                      0.8 = -1.1015 + DIFF_POWER_MEAN_95_REL * 0.3900
   DIFF_POWER_MEAN_95_REL = (0.8 + 1.1015) / 0.3900
   DIFF_POWER_MEAN_95_REL = 4.8756

Your next challenge will be to decide on the cut-off for your prediction. You can do this for example by looking at the ratio false negatives / false positives (confusion matrix).

