[Math] Problem on finding the set of biases and weights in a specific neural network

machine learningneural networksproof-verification

I have a doubt regarding an exercise here. Suppose that we have a neural network that tries to map a $28\times 28$ image of a digit to what digit it actually represents. So we have a neural network like this:

enter image description here

The activation function is calculated for each output neuron and the neuron with the highest activation value ends up "firing" (i.e. if output neuron 1 has the highest activation function value, then the model thinks the digit is 1).

With that background, here's the actual problem:

There is a way of determining the bitwise representation of a digit by adding an extra layer to the three-layer network above. The extra layer converts the output from the previous layer into a binary representation, as illustrated in the figure below. Find a set of weights and biases for the new output layer. Assume that the first $3$ layers of neurons are such that the correct output in the third layer (i.e., the old output layer) has activation at least $0.99$, and incorrect outputs have activation less than $0.01$.

enter image description here

So here I'm assuming the "perceptron rule", i.e. the following activation function for the final layer:

$y^{(i)} = 0$ if $(b + \sum_jw^{(i)}_jx_j) \leq 0$

$y^{(i)} = 1$ if $(b + \sum_jw^{(i)}_jx_j) > 0$

where $j \in \{0,1,2,\ldots,9\}$ and $i \in \{1,2,3,4\}$. Since $x_i < 0.01$ for any incorrect output, I guess the weight $w^{(i)}_j=1$ if the $i$-th bit in the binary representation of digit $j$ is $1$. As an example, since $9$ is $1001$ in binary, $w^{(1)}_9=w^{(4)}_9=1$ and $w^{(2)}_9=w^{(3)}_9=0$.

In that scenario, the maximum number of digits that have $1$ in any of the $4$ bits is $5$ – that's because the $4$-th bit of $1,3,5,7,9$ is $1$. A worst case scenario is if the actual digit is $6$, so $x_6 \geq 0.99$ and all other $x_i$'s $<0.01$. Also, $w^{(1)}_6 =w^{(3)}_6= w^{(5)}_6=w^{(7)}_6=w^{(9)}_6=1$, which means $\sum_jw^{(i)}_jx_j$ is only slightly less than $0.05$, meaning that the bias term $b=-0.05$.

Is this approach/answer correct? I'm not sure if I'm missing something here. Thanks in advance!

Best Answer

Your reasoning, for the most part, is correct. However, as the sigmoid function deals with real numbers in the interval $[0, 1]$, the weights and biases you've chosen are not ideal. Suppose the input was an image of the number $3$, in this case, the $3$rd ($4$th if you're counting from 1) neuron in the third layer is $> 0.99$ while all the other neurons in that layer are $< 0.01$.

Here, $w.x+b$ according to your set of weights and biases give a result of some number $ > 0.94$ for the $0$th output neuron. Taking $z = 0.94$ as input to the sigmoid function, we'd get an output of around $0.28$ for the $0$th neuron. This would logically be a 0. But, for a 3, this neuron must output a 1.

This is happening as the inputs are bounded in the interval $[0, 1]$. If the weights are only $0$ or $1$, the effective value of $z=w.x+b$ can atmost be a one and in the worst case, b. As the term $e^{-z}$ requires a bigger value for $z$ to make it zero, consider increasing the weights to be $10$ and $-10$ instead of $1$ and $0$. The bias can be any value as it doesn't play a major role in deciding the output of a sigmoid neuron, atleast in this particular case.

Related Question