Neural Networks – Importance of Feature Selection and Engineering

deep learningfeature selectionfeature-engineeringneural networks

Particularly in the context of kaggle competitions I have noticed that model's performance is all about feature selection / engineering. While I can fully understand why that is in the case when dealing with the more conventional / old-school ML algorithms, I don't see why this would be the case when using deep neural networks.

Citing the Deep Learning book:

Deep learning solves this central problem in representation learning by introducing representations that are expressed in terms of other, simpler representations. Deep learning enables the computer to build complex concepts out of simpler concepts.

Hence I always thought that if "information is in the data", a sufficiently deep, well-parameterised neural network would pick up the right features given sufficient training time.

Best Answer

  • What if the "sufficiently deep" network is intractably huge, either making model training too expensive (AWS fees add up!) or because you need to deploy the network in a resource-constrained environment?

  • How can you know, a priori that the network is well-parameterized? It can take a lot of experimentation to find a network that works well.

  • What if the data you're working with is not "friendly" to standard analysis methods, such as a binary string comprising thousands or millions of bits, where each sequence has a different length?

  • What if you're interested in user-level data, but you're forced to work with a database that only collects transaction-level data?

  • Suppose your data are the form of integers such as $12, 32, 486, 7$, and your task is to predict the sum of the digits, so the target in this example is $3, 5, 18, 7$. It's dirt simple to parse each digit into an array and then sum the array ("feature engineering") but challenging otherwise.

We would like to live in a world where data analysis is "turnkey," but these kinds of solutions usually only exist in special instances. Lots of work went into developing deep CNNs for image classification - prior work had a step that transformed each image into a fixed-length vector.

Feature engineering lets the practitioner directly transform knowledge about the problem into a fixed-length vector amenable to feed-forward networks. Feature selection can solve the problem of including so many irrelevant features that any signal is lost, as well as dramatically reducing the number of parameters to the model.

Related Question