Solved – Symmetrical uncertainty and Correlation based feature selection

correlationfeature selectionmachine learningMATLABweka

I'm try to study the correlation-based feature selection (cfs) form http://www.cs.waikato.ac.nz/~mhall/thesis.pdf but I'm not sure the relation between cfs and Symmetrical uncertainty (SU) theory, If I calculate the value of correlation, then I need to calculate the value of SU?

I don't understand how to choose number of feature after selected.

Best Answer

The feature selection method presented in the paper uses a correlation measure to compute the feature-class and feature-feature correlation. The paper experiments with three correlation measures (see chapter 4.2):

  1. Symmetrical Uncertainty
  2. Relief
  3. Minimum Description Length

So Symmetrical Uncertainty (SU) is just a correlation measure, you can use any correlation measure you like.

You use this correlation measure to compute the "merit" of a feature subset:

$M_S=\frac{k\bar{r_{cf}}}{\sqrt{k+k(k-1)\bar{r_{ff}}}}$

where

  • k is the number of features
  • $\bar{r_{cf}}$ is the mean class-feature correlation
  • $\bar{r_{ff}}$ is the mean feature-feature correlation

There are many ways to use $M_S$. The paper talks about forward search (start with an empty set and add features) or backward search (start with a set containing all the features and remove features).

So, you decide how many features you want and add/remove features until you remain with the desired number of features.