Solved – Feature selection for test data

classificationfeature selection

We are applying feature selection for train data.
Assume that we are having 1000 selected features.
The testing data contains more than 1000 features.
It results in prediction error "The number of features at training time in scikit learn".
How can we reduce the number of features in testing data?
Should we apply feature selection for testing data also?

Best Answer

You should have the same features as the training time at the testing time. Normally, you do some feature selection or feature extraction at the training time and do the same process in the test time. For example, if by feature selection you find out that it is enough to have a subset of features, you should use the same subset of features at the test time. Or if by feature extraction you define a new feature by combining the existing features, you should use the same function for obtaining the new feature at the test time.

I emphasize that you should NOT use a new feature selection/extraction at the test time. You should use the same features that are selected (or extracted) at the training time.

Best Answer

Related Solutions

Solved – Should feature selection be performed only on training data (or all data)

Solved – feature selection on training and test data

Related Question