Solved – R caret package question

caretr

library(car)
library(caret)
trainIndex <- createDataPartition(Prestige$income, p=.7, list=F)
prestige.train <- Prestige[trainIndex, ]
prestige.test <- Prestige[-trainIndex, ]
my.grid <- expand.grid(.decay = c(0.5, 0.1), .size = c(5, 6, 7))
prestige.fit <- train(income ~ prestige + education, data = prestige.train,
    method = "nnet", maxit = 1000, tuneGrid = my.grid, trace = F, linout = 1)
prestige.predict <- predict(prestige.fit, newdata = prestige.test)
prestige.rmse <- sqrt(mean((prestige.predict - prestige.test$income)^2))

The above was discussed here
How to train and validate a neural network model in R?

Does caret package run many times for combinations of decay and size? If so, what is the default # of iterations?
What is the final choice of decay and size? When I do summary(prestige.fit), the decay is 0.5 and size is 5. Is that the final combination that caret chose as best option?

Best Answer

1) First off, yes the neural net models are run for every unique combination of .decay and .size that you've supplied in my.grid because, well, you created it. If you just specified my.grid = data.frame('.decay'=0.5, '.size'=5), then you would only have one model.

As far as iterations are concerned, you specified the backend modeling application to be the nnet function from the nnet package, so these models are estimated using back-propagation which is an iterative gradient descent process that stops once it converges (or diverges). It could be one iteration if you specify parameters that give a local maximum in the entropy space for your algorithm initialization. Alternately, it could iterate 1000 times and and fail, since you specified maxit=1000. This refers to the model fitting process.

Also built into this innocuous train function is a validation approach where you pick the best set of params from my.grid according to some objective measure and resampling process. The objective measure is specified by hand or by the value of your outcome metric = ifelse(is.factor(y), "Accuracy", "RMSE"). So depending on income (probably continuous), you pick grid options that minimized root mean square error (minimal bias / variance trade off, a good starting place). The method of cross validation is default bootstrap with a staggering default number of resamples being 10. See ?trainControl. Considering this is parallelized, I'm shocked. In my thesis, we weren't quite happy with 10,000 resamples.

2) Your specific model gives the combo in my.grid for which you have minimal RMSE in the pooled 10 bootstrap resamples cross validated model outcome.

Related Solutions

Solved – LGOCV caret package R

From the book: "Repeated training/test splits is also known as 'leave-group-out cross- validation' or 'Monte Carlo cross-validation.'". It is illustrated in Figure 4.7 on page 72.

> LGOCV - when do we use it?

It depends. It has good variance properties if you do a good number of resamples and the bias is really dependent on what percentage of the training data gets left out. If you have a lot of computing power, this might be the preferred method.

> my Y variable is not hierarchical

Not sure what you mean.

Note that we call this LGOCV but we are only holding out a single sample (see the discussion in section 12.1). We needed to call it something in code.

> twoClassSummary - can it be used only when we have two classes?

Yes.

Max

Solved – GBM package vs. Caret using GBM

Use with the default grid to optimize parameters and use predict to have the same results:

R2.caret-R2.gbm=0.0009125435

rmse.caret-rmse.gbm=-0.001680319

library(caret)
library(gbm)
library(hydroGOF)
library(Metrics)
data(iris)

# Using caret with the default grid to optimize tune parameters automatically
# GBM Tuning parameters:
# n.trees (# Boosting Iterations)
# interaction.depth (Max Tree Depth)
# shrinkage (Shrinkage)
# n.minobsinnode (Min. Terminal Node Size)

metric <- "RMSE"
trainControl <- trainControl(method="cv", number=10)

set.seed(99)
gbm.caret <- train(Sepal.Length ~ .
                   , data=iris
                   , distribution="gaussian"
                   , method="gbm"
                   , trControl=trainControl
                   , verbose=FALSE
                   #, tuneGrid=caretGrid
                   , metric=metric
                   , bag.fraction=0.75
                   )                  

print(gbm.caret)

caret.predict <- predict(gbm.caret, newdata=iris, type="raw")

rmse.caret<-rmse(iris$Sepal.Length, caret.predict)
print(rmse.caret)

R2.caret <- cor(gbm.caret$finalModel$fit, iris$Sepal.Length)^2
print(R2.caret)

#using gbm without caret with the same parameters
set.seed(99)
gbm.gbm <- gbm(Sepal.Length ~ .
               , data=iris
               , distribution="gaussian"
               , n.trees=150
               , interaction.depth=3
               , n.minobsinnode=10
               , shrinkage=0.1
               , bag.fraction=0.75
               , cv.folds=10
               , verbose=FALSE
               )
best.iter <- gbm.perf(gbm.gbm, method="cv")
print(best.iter)

train.predict <- predict.gbm(object=gbm.gbm, newdata=iris, 150)

rmse.gbm<-rmse(iris$Sepal.Length, train.predict)
print(rmse.gbm)

R2.gbm <- cor(gbm.gbm$fit, iris$Sepal.Length)^2
print(R2.gbm)

print(R2.caret-R2.gbm)
print(rmse.caret-rmse.gbm)

Best Answer

Related Solutions

Solved – LGOCV caret package R

Solved – GBM package vs. Caret using GBM

Related Question