Neural Networks – Understanding Backpropagation Final Layer Term

backpropagationneural networks

I'm trying to understand the calculation for the gradient of the blue weight shown in the NN below.
enter image description here

In Andrew Ng's Machine Learning coursera module, the δ term for the final layer of the NN is:

enter image description here

However, in other sources, the δ term for the final layer is:
enter image description here

where the red box has been added to the formula (the derivative of the sigmoid activation function).

Both sources then multply δ by the activation of the previous node to get the gradient:

enter image description here

What is the reason for the discrepancy between these two calculations? Could it be because Andrew assumes the activation function of the final layer is g(z) = z?

Best Answer

The difference is that in Andrew Ng's video, the Logistic regression cost function is used. enter image description here

Conversely, the other source uses the Squared Error regression cost function.

enter image description here

The derivation of the delta term using the Logistic regression function is as follows: enter image description here

Related Question