Solved – Is F test used for feature selection only for features with numerical and continuous domain

anovaf-testfeature selectionscikit learn

The F-statistic/test can be used for feature selection, e.g. from http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.f_classif.html#sklearn.feature_selection.f_classif

ANOVA F-value between label/feature for classification tasks.

Can the F-test only be used for features with numerical and continuous domain, or is it also valid for selecting discrete or categorical features? I get that idea, as the F-statistic is based on the mean and variance of a feature.

Best Answer

Assuming you are in the context of stepwise regression, the scale of the feature does not matter. The F-test is done on the difference of RSS values between the smaller and larger model as calculated on the outcome variable (also taking into to account the difference in the number of parameters).

For more information see: http://en.wikipedia.org/wiki/F-test#Regression_problems