i am learning data mining through book . During classification chapters about Neural Networks the authors have below code. I have below questions:
## pre2008 <- 1:nrow(training) ## training is a dataset that has training data
ctrl <- trainControl(method = "LGOCV",
summaryFunction = twoClassSummary,
classProbs = TRUE,
index = list(TrainSet = pre2008),
savePredictions = TRUE)
nnetGrid <- expand.grid(.size = 1:10, .decay = c(0, .1, 1, 2))
maxSize <- max(nnetGrid$.size)
set.seed(476)
nnetFit <- train(x = training[,reducedSet],
y = training$Class,
method = "nnet",
metric = "ROC",
preProc = c("center", "scale"),
tuneGrid = nnetGrid,
trace = FALSE,
maxit = 2000,
MaxNWts = 1*(maxSize * (length(reducedSet) + 1) + maxSize + 1),
trControl = ctrl)
LGOCV – when do we use it? I read the post, but still not clear. the post says that it is a variant of LOOCV for hierarchical data. but my Y variable is not hierarchical 🙁
twoClassSummary – can it be used only when we have two classes? can i used it for say Iris data?
LGOCV is also known as Monte-Carlo Cross Validation. More details are available here.
Best Answer
From the book: "Repeated training/test splits is also known as 'leave-group-out cross- validation' or 'Monte Carlo cross-validation.'". It is illustrated in Figure 4.7 on page 72.
> LGOCV - when do we use it?
It depends. It has good variance properties if you do a good number of resamples and the bias is really dependent on what percentage of the training data gets left out. If you have a lot of computing power, this might be the preferred method.
> my Y variable is not hierarchical
Not sure what you mean.
Note that we call this LGOCV but we are only holding out a single sample (see the discussion in section 12.1). We needed to call it something in code.
> twoClassSummary - can it be used only when we have two classes?
Yes.
Max