Solved – Correlation matrix and redundant information

correlationneural networks

I am using a neural network model for a classification task with 13 inputs.
I study through the connection weights to depict the relevant variables. I have also made a correlation matrix to check the relationship between them:
enter image description here

Some groups of variables seem to have strong positive and negative relationships. My fear is that I would have to remove some because they are redundant (?). However, I may consider keeping them to let the network decide by itself which ones to use among those to get rid of.

Is it generally advised to remove redundant information (if highly correlated) when training neural networks?

My study aims at defining the best variables to use (for similar future classification task) so that we get the best prediction performance at the end. In this purpose, I have removed some of the highly correlated variables but got lower prediction accuracy.

Best Answer

Is it generally advised to remove redundant information (if highly correlated) when training neural networks?

It depends.

Is it necessary? No.

Since a neural network with an appropriate architecture can model any (!) function, you can safely assume, that it also could first model the PAT and then do whatever it also should do -- e.g. classification, regression, etc. (source)

and

In principal, the linear transformation performed by PCA can be performed just as well by by the input layer weights of the neural network, so it isn't strictly speaking necessary (source)

This is because Neural Nets could be used as a non-linear dimensionality reduction tool:

High-dimensional data can be converted to low-dimensional codes by training a multilayer neural network with a small central layer to reconstruct high-dimensional input vectors (source)

In this context, it is also worth mentioning auto-encoders.

Can it help? Yes, it speeds up things.

However, as the number of weights in the network increases, the amount of data needed to be able to reliably determine the weights of the network also increases (often quite rapidly), and over-fitting becomes more of an issue (using regularisation is also a good idea). The benefit of dimensionality reduction is that it reduces the size of the network, and hence the amount of data needed to train it (source)

Speed comes at a cost /slash/ bears a risk

The disadvantage of using PCA is that the discriminative information that distinguishes one class from another might be in the low variance components, so using PCA can make performance worse (source)

This might really be what you have experienced in your experiment.