Solved – the information storage capacity of a neural network

information theoryneural networks

Firstly, a (feed forward) neural network can be thought as a lookup table. $f(x) = y$. So it is certainly storing some data.

The theoretical limit of data compression is -$\sum$ p $ln(p)$. The data in a neural network however is stored as weights and biases and not variable length coding, so does this equation even apply?

Say, I have a fully compressed data set of a certain size, will the size of weights and biases be the same if I have perfectly trained the network?

How do I know if my neural net has enough representative capacity or not to represent the entire data set?

Best Answer

Well, a table is definitely not the right way to look at this. Consider the function $f(x)=x^2$. I can create an infinite table of input-output pairs that are represented by this function. However I can only represent exactly 1 such table with this function. Now consider this function: $g(x)=c\cdot x^2$. For different values of $c$, this function can represent an infinite number of tables (even an uncountable number).
The best way that I can come up with to describe the information storage capacity of a neural network is to quote the universal approximation theorem: https://en.wikipedia.org/wiki/Universal_approximation_theorem. To summarize it, say we have an arbitrary continuous function and we want to approximate the output of this function. Now say that for every input, the output of our approximation shouldn't deviate more than some given $\epsilon>0$. Then we can create a neural network with a single hidden layer that satisfies this constraint, no matter the continuous function, no matter how small the error tolerance. The only requirement is that the amount of nodes in the hidden layer might grow arbitrarily large if we choose the error-rate smaller and smaller.