Solved – Forward propagation calculation for single layer neural network

machine learningneural networksoptimizationpredictive-modelsself-study

Given a single training example $x = (x_1, x_2, x_3)$ and output $y$, the goal is to write down the "sequence of calculations required to compute the squared error cost (called forward propagation)". The hidden units $h_1,h_2$ are logistic, the output neuron f is a linear unit, and we are using the squared error cost function $E = (y − f)^2$.

This is an exercise from an old exam (not homework, but self-study). The network looks as follows:

enter image description here

My first question is regarding the vocabulary: I am not sure what a linear unit is – is f(x) = x? Also, if I understand the exercise correctly, we only have to calculate the forward pass (?), but as an exercise for myself I am trying to derive the backward step as well.

Anyway, I would proceed as follows:

Forward pass:

  • For each $h_i$ we sum over the respective weights time inputs. The input $h_{1_{in}}$ to ${h_1}$ for instance is $w_1*x_1+w_3*x_2+w_5*x_3$.

  • We apply the sigmoid function to these inputs to get the outputs $h_{i_{out}}$ of the $h_i$.

  • The weighted sum $u_1*h_{1_{out}}+u_2*h_{2_{out}}$ is then the input $f_{in}$ to the output node f.

Backward pass

Let's take as an example the weight $u_1$

$\frac{\partial E}{\partial u_1} = \frac{\partial E}{\partial f_{out}}*\frac{\partial f_{out}}{\partial f_{in}}*\frac{\partial f_{in}}{\partial u_1}$

Step by step I would calculate:

  • $\frac{\partial E}{\partial f_{out}} = -2*(y-f_{out})$

  • $\frac{\partial f_{out}}{\partial f_{in}}$; Assuming here f(x) = x, $f_{out} = f_{in}$. So I would have to derive $h_{1_{out}}*u_1+h_{2_{out}}*u_2$ with respect to itself and would end up with 1 ….?!

  • $\frac{\partial f_{in}}{\partial u_1} = h_{1_{out}} $

Taken together $\frac{\partial E}{\partial u_1} = -2*(y-f_{out})*1*h_{1_{out}}$ –
to then update $u_1 = u_1 – \eta * \frac{\partial E}{\partial u_1}$

Is this correct? Thanks

Best Answer

I am not sure what a linear unit is - is f(x) = x?

The output of an individual unit typically (but not always) has the form $g(\vec{w} \cdot \vec{x} + b)$, where $\vec{x}$ is a vector of inputs, $\vec{w}$ is the weight vector, $b$ is the bias, and $g$ is a (possibly nonlinear) activation function. A linear unit has the activation function $g(a) = a$, so its output as a function of its inputs is given by $\vec{w} \cdot \vec{x} + b$.

Forward pass

That looks correct. Each unit would also typically have a scalar bias term added to the weighted sum of inputs. The bias terms are learnable parameters, just like the weights. But, I don't see any mention of that in the figure, so maybe it's not part of this exercise.

What you're calling the input to a unit (e.g. $h_{1_{in}}$) might more commonly be described as the input to unit $h_1$'s activation function. It would be more typical to say that $h_1$'s inputs are $x_1$ and $x_2$ (i.e. a vector containing the outputs of units $x_1$ and $x_2$).

Backward pass

Your use of the chain rule, expression for $\frac{\partial E}{\partial u_1}$, and update rule (assuming gradient descent) look correct.

It's true that $\frac{\partial f_{out}}{\partial f_{in}} = 1$ because this is the derivative of a linear unit's activation function, which is the identity function. Thinking about things in terms of input vectors and activation functions might make things a little more straightforward.

Related Question