Solved – Best feature selection method for naive Bayes classification

classificationfeature selectionmachine learningnaive bayesr

i want to make classification with naive Bayes.
I have got about 100 Features. Numerical ones as well as categorical ones.
Since i want only the most relevant ones to be included for the classification task i want to find them with some kind of feature elimination.
My question now is the following: what is the method to use that for (paper/reference?!) and is this method implemented in some sort of software package. Since i use R i would especially prefer some R package.

Thanks in advance for your help!

Best Answer

There are two different routes you can take. The key word is 'relevance', and how you interpret it.

1) You can use a Chi-Squared test or Mutual information for feature relevance extraction as explained in detail on this link.

In a nutshell, Mutual information measures how much information the presence or absence of a particular term contributes to making the correct classification decision.

On the other hand, you can use the Chi Squared test to check whether the occurrence of a specific variable and the occurrence of a specific class are independent.

Implementating these in R should be straight-forward.

2) Alternatively, you can adopt a wrapper feature selection strategy, where the primary goal is constructing and selecting subsets of features that are useful to build an accurate classifier. This contrasts with 1, where the goal is finding or ranking all potentially relevant variables.

Note that selecting the most relevant variables is usually suboptimal for boosting the accuracy of your classifier, particularly if the variables are redundant. Conversely, a subset of useful variables may exclude many redundant, but relevant, variables.

Related Question