[Math] How to derive derivative of the logarithm of a summation

I'm currently reading the book Deep Learning (Goodfellow et al., 2015) and had a question regarding the calculation of a gradient when explaining backpropagation for a certain example. For anyone who's curious, this is from section 6.5.9: Differentiation outside the Deep Learning Community.

Suppose we have variables $p_1, p_2, … , p_n$ representing probabilities and variables $z_1, z_2, … , z_n$ representing unnormalized log probabilities. Suppose we define

$$q_i = \frac{e^{z_i}}{\sum_i e^{z_i}}$$

where we build the softmax function out of exponentiation, summation and division operations, and construct a cross-entropy loss $J = -\sum_i p_i \log{q_i}$. A human mathematician can observe that the derivateive of $J$ with respect to $z_i$ takes a very simple form: $q_i – p_i$.

I don't know how this result was derived, and was hoping that someone could give me some tips or advice. What I have so far is

$$\log{q_i} = \log{e^{z_i}} – \log({\sum_i e^{z_i}})$$

$$
\begin{align}
p_i\log{q_i} & = p_i \log{e^{z_i}} – p_i \log({\sum_i e^{z_i}}) \\
& = p_iz_i – p_i\log(\sum_i e^{z_i})
\end{align}$$

If we take the derivative of $J = p_i\log{q_i}$ then I can understand that $d/dz_i (p_i z_i) = p_i$, but how do we differentiate the second term that contains the logarithm of the summation?

Thank you.

Best Answer

Your derivation of $p_i\log q_i$ is fine. Based upon it we obtain for $J$:

\begin{align*} J&=-\sum_{j=1}^np_jz_j+\sum_{j=1}^np_j\log\left(\sum_{k=1}^ne^{z_k}\right)\\ &=-\sum_{j=1}^np_jz_j+\log\left(\sum_{k=1}^ne^{z_k}\right)\tag{1} \end{align*}

In the last line we use the sum of the probabilities $p_j,1\leq j\leq n$ is equal to $1$.

From (1) we obtain the derivation of $J$ with respect to $z_i$ as: \begin{align*} \color{blue}{\frac{d}{dz_i}J} &=\frac{d}{dz_i}\left(-\sum_{j=1}^np_jz_j\right)+\frac{d}{dz_i}\left(\log\left(\sum_{k=1}^ne^{z_k}\right)\right)\\ &=-p_i+\frac{e^{z_i}}{\sum_{k=1}^ne^{z_k}}\\ &\,\,\color{blue}{=-p_i+q_i} \end{align*} in accordance with the claim.

Best Answer

Related Solutions

Related Question