Solved – preprocess the new data for a prediction, if I have used preprocessing for building the model

caretnnetr

In this example preprocessing is used to construct a NN:

nnetTune <- train(x = solTrainXtrans, y = solTrainY,
              method = "avNNet",
              tuneGrid = nnetGrid,
              trControl = ctrl,
              preProc = c("center", "scale"),
              linout = TRUE,
              trace = FALSE,
              MaxNWts = 13 * (ncol(solTrainXtrans) + 1) + 13 + 1,
              maxit = 1000,
              allowParallel = FALSE)

If I make predictions with new data, do I have to pre-process this new data or can I directly insert the new data in the model?

Would that be different if I use the model below where data X is preprocessed (centered and scaled) before it is inserted in the nnet?

fit <- nnet(Y~., X, size=12, maxit=500, linout=T, decay=0.01)

Thank you!

Best Answer

Yes, the new data have to be pre-processed as well.

EDIT (based on your last comment):

For your fist code block, I am not sure whether the new data are automatically pre-processed, just because you used the preProc argument.

For your second code block, yes, nnet() does not provide any functionality to pre-process the data.

I would recommend to use the preProcess() function of caret. Actually, when you use preProc as your input argument the preProcess() function is called. You can define the kind of pre-processing you need in the preProcess() function, and then using the predict() function you actually pre-process the data in question. Now, the advantage of using preProcess() is that you can either use the predict() function to pre-process new data, or use the newdata input argument of the preProcess function, which actually does the same thing. Refer to the documentation for more details.

Of course you can pre-process just a single observation. In your example, you center and scale the training set. This means that you compute the mean value and standard deviation of the training set, and then you subtract the mean value and divide by the standard deviation, so as the transformed training set has now mean value of 0, and standard deviation of 1. If you want to pre-process just a single observation, you can just subtract and divide this observation with the aforementioned mean value, and standard deviation, respectively.

As a simple example on how to use the preProcess() function (taken from the documentation):

data(BloodBrain)

preProc  <- preProcess(bbbDescr[1:100,-3])
training <- predict(preProc, bbbDescr[1:100,-3])
test     <- predict(preProc, bbbDescr[101:208,-3])

One last thing; you mention
If I do preprocessing by myself with the testdata and use that preprocessed testdata as input for fitting the nnet... - Just to make this clear, you should fit the training data, and then use the predict() function to generate predictions for new data.

Hope it helps!

Related Solutions

Solved – Which data transformation can improve the performance of MLP neural networks for classification

I don't think left- or reight-skewness is your concern but rather high variance between features. Also, output of transformation doesn't need to be at the same scale. Similar scales also works fine. I suggest you to use zero-mean unit variance on you feature matrix. But why each of your features has 2 different axes? What are corresponding labels of x- and y- axes?

Solved – Do we have to scale new unseen feature data for prediction

1) You should scale the new data as well. You can scale all the data, training and new data together, if possible. Or you store the scaling function and apply it later to the new data. If you have data d that is normally distributed with, lets say mean=m and sd=s, you scale the data by: (d-m)/s. Just apply this function to the new data as well, using the same mean and sd.

2) You can't assign the data you load directly.

#loading model
supmod<-load("model.Rdata")

The resulting variable does only contain the string "model".

Try this:

load("model.Rdata")

This loads the model, the name of the variable is "model".

3) Futher, you have to pass a data.frame (with the same rownames as the training dataset) to predict:

new <- data.frame(Sepal.Length=4.2, Sepal.Width=3.2, Petal.Length=1.7, Petal.Width=0.23)

pre<-predict(model,new)

Best Answer

Related Solutions

Solved – Which data transformation can improve the performance of MLP neural networks for classification

Solved – Do we have to scale new unseen feature data for prediction

Related Question