Solved – LGOCV caret package R

caretrtrain

i am learning data mining through book . During classification chapters about Neural Networks the authors have below code. I have below questions:

## pre2008 <- 1:nrow(training) ## training is a dataset that has training data
ctrl <- trainControl(method = "LGOCV",
                     summaryFunction = twoClassSummary,
                     classProbs = TRUE,
                     index = list(TrainSet = pre2008),
                     savePredictions = TRUE)
nnetGrid <- expand.grid(.size = 1:10, .decay = c(0, .1, 1, 2))
maxSize <- max(nnetGrid$.size)
set.seed(476)
nnetFit <- train(x = training[,reducedSet], 
                 y = training$Class,
                 method = "nnet",
                 metric = "ROC",
                 preProc = c("center", "scale"),
                 tuneGrid = nnetGrid,
                 trace = FALSE,
                 maxit = 2000,
                 MaxNWts = 1*(maxSize * (length(reducedSet) + 1) + maxSize + 1),
                 trControl = ctrl)

LGOCV – when do we use it? I read the post, but still not clear. the post says that it is a variant of LOOCV for hierarchical data. but my Y variable is not hierarchical 🙁

twoClassSummary – can it be used only when we have two classes? can i used it for say Iris data?


LGOCV is also known as Monte-Carlo Cross Validation. More details are available here.

Best Answer

From the book: "Repeated training/test splits is also known as 'leave-group-out cross- validation' or 'Monte Carlo cross-validation.'". It is illustrated in Figure 4.7 on page 72.

> LGOCV - when do we use it?

It depends. It has good variance properties if you do a good number of resamples and the bias is really dependent on what percentage of the training data gets left out. If you have a lot of computing power, this might be the preferred method.

> my Y variable is not hierarchical

Not sure what you mean.

Note that we call this LGOCV but we are only holding out a single sample (see the discussion in section 12.1). We needed to call it something in code.

> twoClassSummary - can it be used only when we have two classes?

Yes.

Max