Solved – In a neural network with N covariates, are >N hidden units useless

multicollinearityneural networks

I'm fitting a neural network to this example data I found online:
Machine Learning Repository

I am cross validating 1 to 10 hidden units (in only 1 layer), and I have the minimum error with 10 hidden units. However, I'm somehow thinking of linearly dependent design matrices when introducing 10 hidden units for only 3 input variables (the level of Red, Green and Blue).

Is this concern justified, or can I just use 10 hidden units here? Maybe the (sigmoid) transformation does something to avoid the linear dependency?

Best Answer

No, there is no need to worry about this, because the non-linear transformation means that the feature space generated by the hidden layer neurons can be of higher dimension than the input space without being linearly dependent.

Consider Ripley's synthetic benchmark dataset, which consists of two classes, each of which is represented by two Gaussian clusters, which looks like this:

Ripley's synthetic benchmark datset

A good solution can be obtained by placing a radial basis function on each cluster and then using a linear discriminant on the output of these four hidden units. You should find that (even) the normal equations for linear regression are numerically well conditioned, which suggests that linear dependence isn't an issue. The non-linear transformation is indeed the reason for this.

Note that if you use regularisation (which I would recommend for any MLP application), then linear dependence isn't a problem anyway.

Related Question