Solved – R caret package question

caretr

library(car)
library(caret)
trainIndex <- createDataPartition(Prestige$income, p=.7, list=F)
prestige.train <- Prestige[trainIndex, ]
prestige.test <- Prestige[-trainIndex, ]
my.grid <- expand.grid(.decay = c(0.5, 0.1), .size = c(5, 6, 7))
prestige.fit <- train(income ~ prestige + education, data = prestige.train,
    method = "nnet", maxit = 1000, tuneGrid = my.grid, trace = F, linout = 1)
prestige.predict <- predict(prestige.fit, newdata = prestige.test)
prestige.rmse <- sqrt(mean((prestige.predict - prestige.test$income)^2))

The above was discussed here
How to train and validate a neural network model in R?

  1. Does caret package run many times for combinations of decay and size? If so, what is the default # of iterations?
  2. What is the final choice of decay and size? When I do summary(prestige.fit), the decay is 0.5 and size is 5. Is that the final combination that caret chose as best option?

Best Answer

1) First off, yes the neural net models are run for every unique combination of .decay and .size that you've supplied in my.grid because, well, you created it. If you just specified my.grid = data.frame('.decay'=0.5, '.size'=5), then you would only have one model.

As far as iterations are concerned, you specified the backend modeling application to be the nnet function from the nnet package, so these models are estimated using back-propagation which is an iterative gradient descent process that stops once it converges (or diverges). It could be one iteration if you specify parameters that give a local maximum in the entropy space for your algorithm initialization. Alternately, it could iterate 1000 times and and fail, since you specified maxit=1000. This refers to the model fitting process.

Also built into this innocuous train function is a validation approach where you pick the best set of params from my.grid according to some objective measure and resampling process. The objective measure is specified by hand or by the value of your outcome metric = ifelse(is.factor(y), "Accuracy", "RMSE"). So depending on income (probably continuous), you pick grid options that minimized root mean square error (minimal bias / variance trade off, a good starting place). The method of cross validation is default bootstrap with a staggering default number of resamples being 10. See ?trainControl. Considering this is parallelized, I'm shocked. In my thesis, we weren't quite happy with 10,000 resamples.

2) Your specific model gives the combo in my.grid for which you have minimal RMSE in the pooled 10 bootstrap resamples cross validated model outcome.

Related Question