The derivative of binary cross entropy loss w.r.t to input of sigmoid function

calculusderivativeslogistic regressionmachine learning

I want to compute the derivative of binary cross entropy loss w.r.t to the input of the sigmoid function and was wondering if there's a closed form expression? I've seen derivations of binary cross entropy loss with respect to model weights/parameters (derivative of cost function for Logistic Regression) as well as derivations of the sigmoid function w.r.t to its input (Derivative of sigmoid function $\sigma (x) = \frac{1}{1+e^{-x}}$), but nothing that combines the two. I would greatly appreciate any help with this.

There's also a post that computes the derivative of categorical cross entropy loss w.r.t to pre-softmax outputs (Derivative of Softmax loss function). I am looking for something similar in the binary case (perhaps this generalizes to the binary case, but not sure).

Best Answer

Use properties of logarithms to simplify as much as possible before taking the derivative.

Let $0 \leq p \leq 1$. We want to compute the derivative of the function \begin{align} L(u) &= -p \log(\sigma(u)) - (1-p)\log(1 - \sigma(u)) \\ &= -p\log( \frac{e^u}{1+e^u} ) - (1-p) \log( \frac{1}{1+e^u}) \\ &= -pu +\log(1 + e^u). \end{align}

Look how much $L(u)$ simplified! Sigmoid and binary cross-entropy are a match made in heaven.

It is now easy to take the derivative of $L$: $$ Lā€™(u) = \sigma(u) - p. $$

This formula has a nice interpretation. If the predicted probability $\sigma(u)$ agrees perfectly with the ground truth probability $p$, then the derivative of $L$ is $0$ ā€” suggesting that we do not need to make any change to the value of $u$.

Related Question