Feature Selection – Why Sequential Forward Doesn’t Select Same Feature as Sequential Backward?

classificationfeature selectionmachine learningneural networksregression

I am working on a binary classification with imbalanced dataset of 77:23 proportion.

class 1 is the minority class.

Currently, am exploring different feature selection techniques and that's when I tried mlxtend's sequential forward selection and backward selection

Let's say my objective is to select the best 5 features from the dataset that will result in best value for the metric chosen

I got the results like below

SFS best 5 features = f1,f2,f3,f4 and f5

SBS best 5 features = f1,f2,f3,f7 and f8.

I wasn't expecting to see a difference in the best feature output returned between 2 methods.

As it is a wrapper based method, I am using the same estimator (RF) with same hyperparameters for both SFS and SBS but still the output is different.

Is there anyway to know/understand why the output returned by SFS is different from SBS?

Best Answer

They are different algorithms and work in a greedy fashion, and may find themselves in suboptimal solutions. It's not surprising they can have differences.