Solved – Using partial AUC as Caret metric for cross-validation

auccaretclassificationcross-validationr

I'm evaluating a grid of tuning parameters using Caret with metric="ROC" for cross-validation. Is there any simple way to use as metric the area under the curve for an specified interval of the ROC curve?

My code is similar to this:

fitControl <- trainControl(method = "repeatedcv",
                       number = 10,
                       repeats = 10,
                       classProbs = TRUE,
                       ## Evaluate performance using 
                       ## the following function
                       summaryFunction = twoClassSummary)

gbmFit3 <- train(Class ~ ., data = training,
             method = "nnet",
             trControl = fitControl,
             verbose = FALSE,
             tuneGrid = myGrid,
             metric = "ROC")

And I would like, using caret, a metric as the partial area under the curve. The most simple way I think it's using cross-validation + pROC package without using caret, but I would like to know if there is a simple way to do it before I try my custom cv.

Anyone?

Best Answer

You can emulate what the package's twoClassSummary function does. See the help page for custom performance metrics.

Max

Related Solutions

Solved – Help requested with using custom model in caret() package

getModelInfo shows you the code for built-in models. grnn is not wrapped by this package, so you won't find code there.

There are a lot of avoidable problems. First, you have your data mixed up:

x <- rep(1:100); y <- x^2+x*rnorm(100,0,1); tr <- data.frame(y=y,x=x)

tr[,-1] is x so y=tr[,-1] is wrong.

For your code, there are a few things:

the grid module should be a function instead of a data frame. That is where the attempt to apply non-function comes from. However:
the arguments to the pred and fit modules do not include most of the required arguments listed on the help page.

For this particular package:

You might have to do something like this:

grnnFit <- function(x, y, wts, param, lev, last, weights, classProbs, ...) {
                    dat <- x
                    dat$.outcome <- y
                        smooth(learn(dat, variable.column = ncol(dat)), 
                               sigma = param$sigma)}

Also, for this package, you might have to use guess inside of apply.

My impression is that you should slow down and read the documentation (it really looks like you did not). There are some weird things about grnn (to me) and it has almost no documentation. That should be the hard part, so read the caret web page and get the easy parts down.

Max

** Update** As Max alluded to, grnn() guess() method can only compute a prediction for a single vector so this had to be wrapped in a for loop.

The new working code:

#Using caret() to determine the optimum value for grnn() smooth parameter    
grnnFit <- function(x, y, wts, param, lev, last, weights, classProbs, ...) {
  #use argument names EXACTLY as here in all functions
  library(grnn)
  dat <- data.frame(y, x)
  s <- smooth(learn(dat), sigma=param$sigma)
  return(s)
}

grnnPred <- function(modelFit, newdata, preProc=NULL, submodels=NULL) {
  library(grnn)
  library(foreach)
  xlst <- split(newdata, 1:nrow(newdata))
  pred <- foreach(i = xlst, .combine = rbind) %do% {
    #grnn() can only compute a prediction for one sample at a time
    guess(modelFit, as.matrix(i)) #provide x values as matrix
  }
}

grnnSort <- function(x) {
  x[order(x$sigma),]
  print(x)
}

grnnGrid <- function(x, y, len=NULL) {
  #only one tuning parameter sigma
  data.frame(sigma=seq(1,4,.05)) #search range
}

grnnLev <- function(x) {
  lev(x)
}

#list of params/functions
lpgrnn <- list(
  library="grnn",
  type="Regression",
  parameters=data.frame(parameter="sigma", class="numeric", label="Sigma"),
  grid=grnnGrid,
  fit=grnnFit,
  predict=grnnPred,
  prob=NULL,
  levels=grnnLev,
  sort=grnnSort)

library(caret)
set.seed(123)
x1 <- rep(1:100) + rnorm(100,0,1)
x2 <- rep(1:100) + rnorm(100,0,1)
tr <- data.frame(y=x1*x2, x1, x2)
set.seed(998)
fitControl <- trainControl(method="repeatedcv", repeats=5)
set.seed(825)
res <- train(y~., data=tr, method=lpgrnn, metric="RMSE", trControl = fitControl)
print(res)
print(res$finalModel$sigma)
plot(res)

sigma versus RMSE

Solved – How does cross-validation in train (caret) precisely work

Yes, you are correct. If you want to look at the details:

For observing the results over parametrization, and the final model chosen, you can compare fit$results with fit$bestTune and fit$finalModel (with same performance the less complex model is chosen).
For observing the performance of the final model parametrization per partition and resample, look at fit$resample. Note that with changing the value for returnResamp in ?trainControl you can configure which results you see here (e.g. if you want to see those also for other than the finally selected parameter set) - but usually the default should be fine.
For observing the individual predictions done during CV you can enable savePredictions = T in ?trainControl, then look at fit$pred, e.g. as table(fit$pred$Resample).

Best Answer

Related Solutions

Solved – Help requested with using custom model in caret() package

Solved – How does cross-validation in train (caret) precisely work

Related Question