Solved – ANOVA F-value for feature selection

classificationfeature selectionmachine learningpython

I have a classification problem with numeric features and binary class value. Is ANOVA F-value in Python (see here) a good technique for the feature selection?

Best Answer

Late answer but at least one!

In general yes, and in particular depends! F-value is a very good criterion for detecting the best individual variables (I'll explain why I don't call it feature ... wait!) for classification.

Why Individual?

F-value is proper for variable ranking. So it is sequentially applied to all variables and will tell you which one is more discriminating according to classes. And it does that very well!

Variable vs Feature

This is not a standard terminology but I used it to tell you something. Look at the figure bellow. Horizontal feature is better than vertical so f-value ranks it higher. But more than that: horizontal feature is good enough for classification i.e. one of your feature is good enough for classification task without any manipulation!

enter image description here

But look at the second figure on the left. None of variables is good enough for classification i.e. the discriminating feature is not in the set of your original variables so F-value does not tell you much. Here you need to find a new feature which discriminate the classes.

LDA

What if we write F-value as a function of our data in which the higher value of function indicates the higher F-value? Then we have optimization techniques to solve this maximization problem and find the axis which is not inside our variables but helps us to compute a new feature based on which the F-value is maximized! It is shown on the right.

enter image description here

Hope it helped! Good luck!

Related Question