I have a binary supervised classification problem with about 62 features, by eye about 30 of them could have reasonable discriminating power. I am using sklearn and the MLP does not have a dedicated feature selection tool like decision trees do.
My question is what is the recommended way to preform feature selection here?
I have read in the sklearn documentation that LDA should not be performed in a binary classification problem and PCA is under the unsupervised methods on the sklearn website.
Does anyone have any experience with this that could suggest a method?
(P.S. Apologies if this question isn't up to standard, this is my first question ever asked)
Best Answer
Two suggestions:
One important reason to use neural network is that, the model can do "feature selection and feature engineering" automatically for us. Unless we have a huge problem (say millions features), it is not necessary to use feature selection for neural network.
Using PCA for feature selection on supervised learning is a bad practice, since it does not consider the "correlation" between feature and label, and direct select feature with large variance. In other words, we can have a completely useless feature but with large variance in data, and PCA will select it. See my answer here for details How to decide between PCA and logistic regression?