Solved – How to do feature selection on categorical features

categorical datafeature selection

One naive approach I can think of is to recursively eliminate features to exhaustively see which subsets of features has better metrics ($R^2$ score, etc). I think the key problem here is how to encode the categorical features and how to reverse back to original categorical features from those selected encodings after feature selection. But right now, the only encoding I know is one hot encoding which create so many dummy variables and I don't know any appropriate way of how to convert selected dummy variables to original categorical features (if this is the way to do it).

Best Answer

This is common in NLP: manipulating very high dimensional feature vectors. Each dimension corresponds to a word (or a bigram) if you are considering a very simple case of text classification.

Feature selection, taking again the text classification task, can be done very naturally in Logistic Regression with $l_1$-norm. You control the strength of the parameter and the algorithm automatically prunes (sets their coefficients to 0) away features that don't contribute for the classification task.

In Bayesian setting, I believe feature selection is done using special priors like Laplacian. Unfortunately I am not very proficient in this field.

Having some custom rules to prune away features is nice and instructive. For a real problem, let a well-tested learning algorithm this do for you.