I know that for each feature-class pair, the value of the chi-square statistic is computed and compared against a threshold.
I am a little confused though. If there are $m$ features and $k$ classes, how does one build the contingency table? How does one decide which features to keep and which ones to remove?
Any clarification will be much appreciated. Thanks in advance
Best Answer
The chi-square test is a statistical test of independence to determine the dependency of two variables. It shares similarities with coefficient of determination, R². However, chi-square test is only applicable to categorical or nominal data while R² is only applicable to numeric data.
From the definition, of chi-square we can easily deduce the application of chi-square technique in feature selection. Suppose you have a target variable (i.e., the class label) and some other features (feature variables) that describes each sample of the data. Now, we calculate chi-square statistics between every feature variable and the target variable and observe the existence of a relationship between the variables and the target. If the target variable is independent of the feature variable, we can discard that feature variable. If they are dependent, the feature variable is very important.
Mathematical details are described here:http://nlp.stanford.edu/IR-book/html/htmledition/feature-selectionchi2-feature-selection-1.html
For continuous variables, chi-square can be applied after "Binning" the variables.
An example in R, shamelessly copied from FSelector
Not related to so much in feature selection but the video below discusses the chisquare in detail https://www.youtube.com/watch?time_continue=5&v=IrZOKSGShC8