In this example preprocessing is used to construct a NN:
nnetTune <- train(x = solTrainXtrans, y = solTrainY,
method = "avNNet",
tuneGrid = nnetGrid,
trControl = ctrl,
preProc = c("center", "scale"),
linout = TRUE,
trace = FALSE,
MaxNWts = 13 * (ncol(solTrainXtrans) + 1) + 13 + 1,
maxit = 1000,
allowParallel = FALSE)
If I make predictions with new data, do I have to pre-process this new data or can I directly insert the new data in the model?
Would that be different if I use the model below where data X is preprocessed (centered and scaled) before it is inserted in the nnet?
fit <- nnet(Y~., X, size=12, maxit=500, linout=T, decay=0.01)
Thank you!
Best Answer
Yes, the new data have to be pre-processed as well.
EDIT (based on your last comment):
For your fist code block, I am not sure whether the new data are automatically pre-processed, just because you used the
preProc
argument.For your second code block, yes,
nnet()
does not provide any functionality to pre-process the data.I would recommend to use the
preProcess()
function ofcaret
. Actually, when you usepreProc
as your input argument thepreProcess()
function is called. You can define the kind of pre-processing you need in thepreProcess()
function, and then using thepredict()
function you actually pre-process the data in question. Now, the advantage of usingpreProcess()
is that you can either use thepredict()
function to pre-process new data, or use thenewdata
input argument of thepreProcess
function, which actually does the same thing. Refer to the documentation for more details.Of course you can pre-process just a single observation. In your example, you center and scale the training set. This means that you compute the mean value and standard deviation of the training set, and then you subtract the mean value and divide by the standard deviation, so as the transformed training set has now mean value of 0, and standard deviation of 1. If you want to pre-process just a single observation, you can just subtract and divide this observation with the aforementioned mean value, and standard deviation, respectively.
As a simple example on how to use the
preProcess()
function (taken from the documentation):One last thing; you mention
If I do preprocessing by myself with the testdata and use that preprocessed testdata as input for fitting the nnet...
- Just to make this clear, you should fit the training data, and then use thepredict()
function to generate predictions for new data.Hope it helps!