Compute the derivative of a matrix algebra expression

derivativesleast squareslinear algebramatricesmatrix-calculus

I came across a question pertaining to finding the derivative of a particular matrix expression. How do you compute the derivative of a matrix algebra expression?

The article the question refers to can be found at: https://web.archive.org/web/20180403213813/http://jimherold.com/2012/04/20/least-squares-bezier-fit/

Anyway, I am wondering if the logic from the answer (and thus the derivative) applies to other matrix expressions of the form
$$E(C_y) = (y – \mathbb{T}MC_y)^T(y – \mathbb{T}MC_y)$$
or if there is something special about the matrix M (potentially that it is lower triangular?) that causes the derivative to be
$$\frac{\partial E} {\partial C} = -2\mathbb{T}^T(y-\mathbb{T}MC_y)$$

In particular, if M is not triangular, would the derivative be the same? And if not, how could one find it?

Best Answer

Firstly, some facts and notations:

  • Trace and Frobenius product relation $$\left\langle A, B C\right\rangle={\rm tr}(A^TBC) := A : B C$$
  • Cyclic properties of Trace/Frobenius product \begin{align} A : B C D &= (BC)^T A : D \\ &= BCD : A \\ &= {\text{etc.}} \cr \end{align}

So, we can rewrite the cost function in Frobenius product notation as $$E(C_y) := (y - \mathbb{T}MC_y)^T(y - \mathbb{T}MC_y) = (y - \mathbb{T}MC_y) : (y - \mathbb{T}MC_y) \ .$$

Now, we can obtain the differential first, and then the gradient. \begin{align} dE(C_y) &= d\left( y - \mathbb{T}MC_y : y - \mathbb{T}MC_y \right) \\ &= \left( -\mathbb{T}M \ dC_y : y - \mathbb{T}MC_y \right) + \left( y - \mathbb{T}MC_y : - \mathbb{T}M \ dC_y \right) \\ &= 2 \left( y - \mathbb{T}MC_y \right) : -\mathbb{T}M \ dC_y \\ &= -2 \left( \mathbb{T}M \right)^T \left( y - \mathbb{T}MC_y \right) : dC_y \\ \end{align}

Thus, the gradient is \begin{align} \frac{\partial E(C_y)}{\partial C_y} = -2 \left( \mathbb{T}M \right)^T \left( y - \mathbb{T}MC_y \right). \end{align}

Related Question