Solved – How does feature selection work for non linear models

classificationfeature selectionlassoregression

A model like a neural network or an SVM is called for only if the interactions between the features and the target is non-linear, otherwise we're better off using linear or logistic regression.

But all of the feature selection methods I've come across use linear criteria for determining feature importance: For example if two features are highly correlated then we can discard one and use only one of them. Same thing with LASSO or using a variance threshold. They only make sense to me if we assume a linear relationship to the output.

So how does one then perform feature selection for non linear supervised models?

Best Answer

Assuming non linear feature interactions, one could use something like mutual information which can capture both linear and non linear dependencies. You could check the mutual information between your features and the target variable and based on this criteria select relevant features.

There is a family of simple and famous models that captures non linear interactions, to give you the simplest one as example decision trees, which use the information gain criteria to build its splits. Once you fit a decision tree(or similar models) you gain a feature importance vector that will tell you the importance of each feature. There is this paper on Feature Selection via Regularized Trees.

In the end, regardless of the type of features interaction or nature of the model (linear vs non linear) you can always apply some type of search for feature subset selection as described here