Neural Network Models – Identifiability in Neural Network Architectures

conv-neural-networkidentifiabilityneural networksrecurrent neural network

It's quite intuitive that most neural network topologies/architectures are not identifiable. But what are some well-known results in the field? Are there simple conditions which allow/prevent identifiability? For example,

  • all networks with nonlinear activation functions and more than one hidden layer are not identifiable
  • all networks with more than two hidden units are not identifiable

Or things like these. NOTE:I'm not saying that these conditions prevent identifiability (though they seem pretty good candidates to me). They are just examples of what I mean with "simple conditions".

If it helps to narrow down the question, feel free to consider only feed-forward and recurrent architectures. If this is still not enough, I'd be satisfied with an answer which cover at least one architecture among MLP, CNN and RNN. I had a quick look around on the Web but it looks like the only discussion I could find was on Reddit. Come on, people, we can do better than Reddit 😉

Best Answer

Linear, single-layer FFNs are non-identified

The question as since been edited to exclude this case; I retain it here because understanding the linear case is a simple example of the phenomenon of interest.

Consider a feedforward neural network with 1 hidden layer and all linear activations. The task is a simple OLS regression task.

So we have the model $\hat{y}=X A B$ and the objective is $$ \min_{A,B} \frac{1}{2}|| y - X A B ||_2^2 $$

for some choice of $A, B$ of appropriate shape. $A$ is the input-to-hidden weights, and $B$ is the hidden-to-output weights.

Clearly the elements of the weight matrices are not identifiable in general, since there are any number of possible configurations for which two pairs of matrices $A,B$ have the same product.

Nonlinear, single-layer FFNs are still non-identified

Building up from the linear, single-layer FFN, we can also observe non-identifiability in the nonlinear, single-layer FFN.

As an example, adding a $\tanh$ nonlinearity to any of the linear activations creates a nonlinear network. This network is still non-identified, because for any loss value, a permutation of the weights of two (or more) neurons at one layer, and their corresponding neurons at the next layer, will likewise result in the same loss value.

In general, neural networks are non-identified

We can use the same reasoning to show that neural networks are non-identified in all but very particular parameterizations.

For example, there is no particular reason that convolutional filters must occur in any particular order. Nor is it required that convolutional filters have any particular sign, since subsequent weights could have the opposite sign to "reverse" that choice.

Likewise, the units in an RNN can be permuted to obtain the same loss.

See also: Can we use MLE to estimate Neural Network weights?