Solved – Alternatives for PCA for dimensionality reduction before classification

classificationdimensionality reductiondiscriminant analysispca

I have a classification problem with only 2 classes, about 100 features and thousands of observations, each belonging to either of the two classes.

Currently I´m doing a PCA prior to machine learning algorithms, which is very successful indeed, so I was wondering whether another dimensionality reduction may be even better than PCA.

What and how could I do dimensionality reduction in a different way than PCA?

I already thought about LDA, but as I have 2 classes only 1 dimension would remain, which would be way too little for correct class prediction on test observations.

Any ideas?

Best Answer

You have many options. You could check the correlation between features and remove features which are highly correlated with other ones. You could build a random forest with the data and observe the feature importances that result from this, and remove the ones which have low importance. You could do something similar with logistic regression to take a subset of the features as well. Here is a nice discussion of that. Importance of variables in logistic regression