Solved – PreProcess from Caret doesn’t work with a smaller dataset

caretdata preprocessing

I am trying to use Caret to train some prediction models. As part of this, I would like to use 'PreProcess'. I have, however, come to the conclusion that PreProcess requires a certain number of columns in the data set. Can this be true? If I take the first 179 columns of this dataset, “preProcess(dataset, method = c("center", "scale", "YeoJohnson"))” works without problems. If, however, I take only the first 178 columns, I get the error message:

Warning in pre_process_options(method, column_types) :
The following pre-processing methods were eliminated: 'center', 'scale', 'YeoJohnson'

Could you please explain this? And is it somehow possible to preprocess my data even if I have fewer columns?

I am using RStudio Version 0.99.491 with R version 3.2.3 on a PC with windows 7.

Thanks in advance for any help!

Best Answer

I had a similar problem before and discovered it related to my inadvertently using a matrix without columns names. preProcess calls the column names within the function so if these are null it will fail. If your subsetting method is losing this information this could be a cause of your error.