Machine Learning – Should Feature Selection be Done Before or After Encoding?

categorical-encodingfeature selectionfeature-scalingmachine learningscikit learn

Should I apply feature Scaling and Selection before or after the One Hot Encoding/Label Encoding?

Please Correct me if I'm Wrong-

  1. Deal with Outliers
  2. Impute missing Values
  3. Label Encode/One Hot encode categorical values
  4. Apply Dimensionality Reduction
  5. Apply Feature Selection

Please correct if I'm wrong.

Best Answer

The mentioned steps are correct.

Feature scaling (min/max, mean/stdev) is for numerical values so it doesn't matter to be before or after label encoding; but keep it in mind that you SHOULD NOT do scaling on encoded categorical features.

For dimensionality reduction or feature selection, you need to have numerical values; so you should do them after label encoding.

Related Question