Solved – How was “derivative of the error function with respect to the activation” (that looks like y -t) derived

backpropagationmachine learningneural networks

In Chapter 5 (Neural Networks) of Bishop Pattern Recognition and Machine Learning he mentions several times that the derivative of the error function with respect to the activation for a particular output unit takes the form
$$
\frac{\delta E}{\delta a_k} = y_k – t_k
$$
where $E$ is the error function, $a_k$ is the activation, $y_k$ is the output, and $t_k$ is the target value. This is equation 5.18 in the book.

Could someone explain how this was derived? Thank you!

Best Answer

Eq. 5.11 defines the error as: $$ E(w) = \frac{1}{2} \sum_{n=1}^N ||y(x_n, w) - t_n||^2 $$ Just above eq. 5.18, it is assumed that $y_k = a_k$, since regression uses linear output function (identity) at the output. Thus: $$ E = \frac{1}{2} \sum_{n=1}^N ||a_n - t_n||^2 $$ Now we derive w.r.t. $a_n$: $$ \frac{\delta E}{\delta a_n} = \frac{1}{2} 2(a_n - t_n) = a_n - t_n $$

Related Question