I'm a biologist self-learning linear algebra for some applications in my work. Taking derivatives in non-linear-algebra contexts is quite clear to me, as I can just chain-rule my way through problems… but doing so in a linear-algebra context is a bit of a mystery to me.
I'm trying to take the derivative of the following equation $$y = \vec{d}^TP^TP\vec{\delta},$$ where $$\vec{d} = \left[ \begin{array}\\ d_1 \\ d_2 \\ d_3 \end{array} \right]$$ and $$\vec{\delta} = \left[ \begin{array}\\ o_1^2 -d_1^2 \\ o_2^2 – d_2^2 \\ o_3^2 – d_3^2 \end{array} \right].$$
$P$ is just a 3×3 matrix of constants.
I'd like to find an elegant solution to finding the derivative $y^\prime \left( \vec{d} \right)$, resulting in the Jacobian.
The challenging part is that the vector $\vec{\delta}$ is a composite vector holding $\vec{\delta} = \vec{o} – \vec{d}^{\circ2}$ (not sure on my notation here.. but $\vec{d}^{\circ2}$ is supposed to indicate all the elements of $d$ are squared).
So far, the only way I'm aware of doing this is by expanding everything out into one really large linear formula for the resulting single value, then taking the derivative of that.
My end goal here is to implement this in code… so unnecessarily large formulas are something I'm trying to avoid to keep my code readable.
Is there a more elegant solution to this?
Best Answer
$\DeclareMathOperator{\diag}{diag}$ In what follows $d, x, y$ and $\delta$ denote column vectors.
The function $y(d) = d^T P^T P \delta$ can be rewritten as the dot product $\langle Pd, P\delta \rangle$. This is the composition of three functions:
Putting the three things together using the chain rule gives
$$ \begin{align*} Dy(d) &= Dy_3(y_2 \circ y_1(d)) Dy_2(y_1(d)) Dy_1(d) \\ &= Dy_3(Pd, P \delta) \begin{bmatrix} P & 0 \\ 0 & P \end{bmatrix} \begin{bmatrix} I \\ -2\diag(d) \end{bmatrix} \\ &= ((Pd)^T, (P\delta)^T) \begin{bmatrix} P & 0 \\ 0 & P \end{bmatrix} \begin{bmatrix} I \\ -2\diag(d) \end{bmatrix} \\ &= d^TP^TP - 2 \delta^T P^T P \diag(d). \end{align*} $$