Dimensionality Reduction or Regularization – What to Do When Facing Too Many Features?

dimensionality reductionmachine learningregularization

I have just started machine learning and was asked this concept-based question,

"Suppose you are working on a stock market prediction model and the data you collected have millions of features, what should you do?"

I found two possible methods – Regularization and dimensionality reduction. But I was told that regularization is incorrect because it does not affect input data but only the output data. Whereas dimensionality reduction removes unnecessary/useless data that generates noise.

My main question is, if excessive features in a dataset could cause overfitting and regularization can help to reduce the complexity of the model, why is regularization not a valid solution?

Would sincerely appreciate if anyone could provide some usage examples for both methods.

Best Answer

Interesting question - I assume that you want to build a classifier that predicts for example whether the price of a stock goes up - or down - during the coming day.

My advice is to take a different path. With literally millions of feature variables available, I would start off with a machine learning method that uses feature selection as an integral part of its building process. I would go for C4.5 or other decision tree classifiers which perform sequential forward feature search during learning.

You will end up with a classifier model with a small subset of well-predicting features. Use that subset of features to train other classifiers for comparison, neural networks, discriminant analysis, logistic regression - a support vector machine.

Regularization is mainly applied for reducing model complexity - a different purpose than feature selection. Dimensionality reduction is applicable - but most often comes down to principal component analysis (PCA) - a technique which assumes normally distributed data.

Best Answer

Related Solutions

Solved – Does PCA followed by LDA make sense, when there is more data available for PCA than for LDA

Related Question