Solved – Why Feature Selection with sklearn.feature_selection.SequentialFeatureSelector is a preprocessing task

explanatory-modelsfeature selectionscikit learn

I am facing a feature selection problem.
Because I am building an Explanatory Regression Model I decided to follow a Forward Sequential Feature Selection.

Moreover I wanted to implement sklearn.feature_selection.SequentialFeatureSelector for features selection.

After reading sklearn documentation about this transformer some doubts raised.

Quoting the documentation :

Feature selection is usually used as a pre-processing step before
doing the actual learning. The recommended way to do this in
scikit-learn is to use a Pipeline[…]

Why SequentialFeatureSelector is used in pre-processing? Specifically inside a Pipeline?

I supposed to imagine this as an iterative process evaluating the performance metric and computing a statistical significance test in a for-loop, training each time my model with a different number of variables and then evaluating how it performs.

The only reason that I can find for choosing to do forward feature selection before training is for a "computational saving".

It is correct as an assumption and there are other reasons?

Best Answer

I am not sure how much below explanation is relevant to this question but let me try.

Feature selection is the process of including the significant features in the model. We have many options to do but generally we can use below method to reduce number of features in the model.

  1. Forward selection method
  2. Backward elimination method
  3. Stepwise method
  4. Recursive feature elimination (RFE)

Internally all above methods are building model and based on rules eliminating insignificant features. Sometimes we use selected significant features in the model as next step. For more detail you can refer below link

http://rasbt.github.io/mlxtend/user_guide/feature_selection/SequentialFeatureSelector/

Related Question