Solved – Derive logistic loss gradient in matrix form

gradientlogisticmatrix

User Antoni Parellada had a long derivation here on logistic loss gradient in scalar form. Using the matrix notation, the derivation will be much concise. Can I have a matrix form derivation on logistic loss? Where how to show the gradient of the logistic loss is

$$
A^\top\left( \text{sigmoid}~(Ax)-b\right)
$$

For comparison, for linear regression $\text{minimize}~\|Ax-b\|^2$, the gradient is $2A^\top\left(Ax-b\right)$, I have a derivation here.
Related question: Matrix notation for logistic regression

Best Answer

Here is my try

$$J(x) = -\frac{1}{m}\sum_{i = 1}^{m} b_iln(h_i) + (1 - b_i)ln(1 - h_i)$$

where $h_i = \sigma(x^Ta_i)$. Let $A = [a_1^T, \dots, a_m^T]^T$. Assuming $ln, \sigma, \frac{1}{\cdot}$ work element-wise on vectors, $\odot$ is element-wise multiplication and $\mathbb{1}$ is a vector of $1$s we have

$$J(x) = -\frac{1}{m}\big[b^Tln(\sigma(Ax)) + (\mathbb{1} - b)^Tln(\mathbb{1} - \sigma(Ax))\big]$$

Now

$$\frac{\partial J(x)}{\partial x} = -\frac{1}{m}\Big[\frac{\partial}{\partial x}b^Tln(\sigma(Ax)) + \frac{\partial}{\partial x}(\mathbb{1} - b)^Tln(\mathbb{1} - \sigma(Ax))\Big] \\ = -\frac{1}{m}\Big[\frac{\partial ln(\sigma(Ax))}{\partial x}b + \frac{\partial ln(\mathbb{1} - \sigma(Ax))}{\partial x}(\mathbb{1} - b)\Big] \\ = -\frac{1}{m}\Big[\frac{\partial \sigma(Ax)}{\partial x} \big(\frac{1}{\sigma(Ax)}\odot b\big) + \frac{\partial \mathbb{1} - \sigma(Ax)}{\partial x}\big(\frac{1}{\mathbb{1} - \sigma(Ax)}\odot (\mathbb{1} - b)\big)\Big] \\ = -\frac{1}{m}\Big[\frac{\partial Ax}{\partial x} \big(\sigma(Ax) \odot (\mathbb{1} - \sigma(Ax)) \odot \frac{1}{\sigma(Ax)}\odot b\big) - \frac{\partial Ax}{\partial x}\big(\sigma(Ax) \odot (\mathbb{1} - \sigma(Ax)) \odot \frac{1}{\mathbb{1} - \sigma(Ax)}\odot (\mathbb{1} - b)\big)\Big] \\ = -\frac{1}{m}\Big[A^T \big((\mathbb{1} - \sigma(Ax)) \odot b\big) - A^T\big(\sigma(Ax)) \odot (\mathbb{1} - b)\big)\Big] \\ = -\frac{1}{m}\Big[A^T \big(b - \sigma(Ax) \odot b - \sigma(Ax) + \sigma(Ax) \odot b\big)\Big] \\ = -\frac{1}{m}\Big[A^T \big(b - \sigma(Ax)\big)\Big] \\ = \frac{1}{m}\Big[A^T \big(\sigma(Ax) - b\big)\Big] $$

Best Answer

Related Solutions

Solved – From the Perceptron rule to Gradient Descent: How are Perceptrons with a sigmoid activation function different from Logistic Regression

Solved – Deriving gradient of a single layer neural network w.r.t its inputs, what is the operator in the chain rule

Related Question