Is there an elegant way to take a derivative of this linear algebra problem

derivativeslinear algebra

I'm a biologist self-learning linear algebra for some applications in my work. Taking derivatives in non-linear-algebra contexts is quite clear to me, as I can just chain-rule my way through problems… but doing so in a linear-algebra context is a bit of a mystery to me.

I'm trying to take the derivative of the following equation $$y = \vec{d}^TP^TP\vec{\delta},$$ where $$\vec{d} = \left[ \begin{array}\\ d_1 \\ d_2 \\ d_3 \end{array} \right]$$ and $$\vec{\delta} = \left[ \begin{array}\\ o_1^2 -d_1^2 \\ o_2^2 – d_2^2 \\ o_3^2 – d_3^2 \end{array} \right].$$
$P$ is just a 3×3 matrix of constants.

I'd like to find an elegant solution to finding the derivative $y^\prime \left( \vec{d} \right)$, resulting in the Jacobian.
The challenging part is that the vector $\vec{\delta}$ is a composite vector holding $\vec{\delta} = \vec{o} – \vec{d}^{\circ2}$ (not sure on my notation here.. but $\vec{d}^{\circ2}$ is supposed to indicate all the elements of $d$ are squared).
So far, the only way I'm aware of doing this is by expanding everything out into one really large linear formula for the resulting single value, then taking the derivative of that.

My end goal here is to implement this in code… so unnecessarily large formulas are something I'm trying to avoid to keep my code readable.
Is there a more elegant solution to this?

Best Answer

$\DeclareMathOperator{\diag}{diag}$ In what follows $d, x, y$ and $\delta$ denote column vectors.

The function $y(d) = d^T P^T P \delta$ can be rewritten as the dot product $\langle Pd, P\delta \rangle$. This is the composition of three functions:

  1. $y_1 (d) = \begin{bmatrix}d\\ \delta\end{bmatrix}$ which has derivative $Dy_1(d) = \begin{bmatrix} I \\ -2\diag(d) \end{bmatrix}$, where $\diag(d)$ is the diagonal matrix with $(d_1, d_2, d_3)$ along the diagonal and $I$ is the $3 \times 3$ identity matrix.
  2. $y_2(x, y) = \begin{bmatrix}Px \\ Py \end{bmatrix}$ which has the derivative $Dy_2(x, y) = \begin{bmatrix} P & 0 \\ 0 & P \end{bmatrix}$.
  3. $y_3 (x, y) = \langle x, y \rangle$ which has the derivative $Dy_3(x, y) = (y^T, x^T)$.

Putting the three things together using the chain rule gives

$$ \begin{align*} Dy(d) &= Dy_3(y_2 \circ y_1(d)) Dy_2(y_1(d)) Dy_1(d) \\ &= Dy_3(Pd, P \delta) \begin{bmatrix} P & 0 \\ 0 & P \end{bmatrix} \begin{bmatrix} I \\ -2\diag(d) \end{bmatrix} \\ &= ((Pd)^T, (P\delta)^T) \begin{bmatrix} P & 0 \\ 0 & P \end{bmatrix} \begin{bmatrix} I \\ -2\diag(d) \end{bmatrix} \\ &= d^TP^TP - 2 \delta^T P^T P \diag(d). \end{align*} $$

Related Question