Solved – Is feature transformation (power, log, Box-Cox) necessary in deep learning

data transformationdeep learningneural networks

I've read that it's beneficial to apply certain common feature transformations on datasets before they hit machine learning models. These based on the distributions of the dataset's features; eg, applying log transforms to skewed normally-distributed features. Some examples here.

Now as I understand, a main boon of deep learning is "automatic feature engineering" (aka, "feature learning"). I know that includes feature combinations; but my hunch says that also includes learned feature transformations per the above? So when using deep networks with well-tuned hypers, can feature-transformations safely be removed from the human's responsibilities – that is, throw all this log/square/box-cox stuff away?

[Edit] Extra: does this also handle "feature selection" (deciding which inputs not to include) for you?

Best Answer

The rule of thumb is: the more data you have available, the less you have to care about feature engineering (which is basically inputting some prior knowledge into the model, based on the domain expertise).

Theoritically (with huge enough number of the samples) you could solve imagenet without using any convolutions, only deep feedforward network. But by knowing that pixels are spatially correlated (which makes it so that the convolutions will be much better way to tackle this problem) you can design an algorithm which is much more data-efficient.

Related Question