I am facing a feature selection problem.
Because I am building an Explanatory Regression Model I decided to follow a Forward Sequential Feature Selection.
Moreover I wanted to implement sklearn.feature_selection.SequentialFeatureSelector
for features selection.
After reading sklearn
documentation about this transformer some doubts raised.
Feature selection is usually used as a pre-processing step before
doing the actual learning. The recommended way to do this in
scikit-learn is to use a Pipeline[…]
Why SequentialFeatureSelector
is used in pre-processing? Specifically inside a Pipeline
?
I supposed to imagine this as an iterative process evaluating the performance metric and computing a statistical significance test in a for-loop, training each time my model with a different number of variables and then evaluating how it performs.
The only reason that I can find for choosing to do forward feature selection before training is for a "computational saving".
It is correct as an assumption and there are other reasons?
Best Answer
I am not sure how much below explanation is relevant to this question but let me try.
Feature selection is the process of including the significant features in the model. We have many options to do but generally we can use below method to reduce number of features in the model.
Internally all above methods are building model and based on rules eliminating insignificant features. Sometimes we use selected significant features in the model as next step. For more detail you can refer below link
http://rasbt.github.io/mlxtend/user_guide/feature_selection/SequentialFeatureSelector/