Solved – How to correctly interpret f-regression values during feature selection

feature selectionscikit learn

I am new to machine learning. I would like to know how correctly interpret scikit-learn's f_regression values, in order to perform a good feature selection (I'm using f_regression as score function for SelectKBest()).

More precisely: Is a feature with a high value always better than one with a lower value? How should I interpret f_regression's values when I am using categorical variables?

Best Answer

Pick the variables with the highest F-statistic. If you want, you can use a more handy function, like feature_selection.SelectPercentile.

selector = feature_selection.SelectPercentile(feature_selection.f_regression, percentile=30)
selector.fit(X_train, y_train)
selector.get_support(True)

The method get_support (with True) will return the indexes of the selected features, according to the percentile.

Related Question