I came across a question pertaining to finding the derivative of a particular matrix expression. How do you compute the derivative of a matrix algebra expression?
The article the question refers to can be found at: https://web.archive.org/web/20180403213813/http://jimherold.com/2012/04/20/least-squares-bezier-fit/
Anyway, I am wondering if the logic from the answer (and thus the derivative) applies to other matrix expressions of the form
$$E(C_y) = (y – \mathbb{T}MC_y)^T(y – \mathbb{T}MC_y)$$
or if there is something special about the matrix M (potentially that it is lower triangular?) that causes the derivative to be
$$\frac{\partial E} {\partial C} = -2\mathbb{T}^T(y-\mathbb{T}MC_y)$$
In particular, if M is not triangular, would the derivative be the same? And if not, how could one find it?
Best Answer
Firstly, some facts and notations:
So, we can rewrite the cost function in Frobenius product notation as $$E(C_y) := (y - \mathbb{T}MC_y)^T(y - \mathbb{T}MC_y) = (y - \mathbb{T}MC_y) : (y - \mathbb{T}MC_y) \ .$$
Now, we can obtain the differential first, and then the gradient. \begin{align} dE(C_y) &= d\left( y - \mathbb{T}MC_y : y - \mathbb{T}MC_y \right) \\ &= \left( -\mathbb{T}M \ dC_y : y - \mathbb{T}MC_y \right) + \left( y - \mathbb{T}MC_y : - \mathbb{T}M \ dC_y \right) \\ &= 2 \left( y - \mathbb{T}MC_y \right) : -\mathbb{T}M \ dC_y \\ &= -2 \left( \mathbb{T}M \right)^T \left( y - \mathbb{T}MC_y \right) : dC_y \\ \end{align}
Thus, the gradient is \begin{align} \frac{\partial E(C_y)}{\partial C_y} = -2 \left( \mathbb{T}M \right)^T \left( y - \mathbb{T}MC_y \right). \end{align}