I am working on a binary classification with imbalanced dataset of 77:23 proportion.
class 1 is the minority class.
Currently, am exploring different feature selection techniques and that's when I tried mlxtend's sequential forward selection and backward selection
Let's say my objective is to select the best 5 features from the dataset that will result in best value for the metric chosen
I got the results like below
SFS best 5 features = f1,f2,f3,f4 and f5
SBS best 5 features = f1,f2,f3,f7 and f8.
I wasn't expecting to see a difference in the best feature output returned between 2 methods.
As it is a wrapper based method, I am using the same estimator (RF) with same hyperparameters for both SFS and SBS but still the output is different.
Is there anyway to know/understand why the output returned by SFS is different from SBS?
Best Answer
They are different algorithms and work in a greedy fashion, and may find themselves in suboptimal solutions. It's not surprising they can have differences.