[Math] Question on using chain rule or product rule to find Jacobian of function with matrices times a vector…

calculusmatrices

Suppose we have a function consisting of a series of matrices multiplied by a vector:
$$f(x) = ABb,$$
where

  • $x$ is a vector containing elements that are contained within $A, b$, and/or $b$,
  • $A$ is a matrix, $B$ is a matrix, and $b$ is a vector.

Each Matrix and the vector is expressed as more terms, i.e.
\begin{align*}
&x = (x_1, x_2, x_3),\\
&A = \pmatrix{
x_1 + y_1& y_4& y_7\\
y_2& x_2 + y_5& y_8\\
y_3& y_6& x_3 + y_9},
\ B = \pmatrix{
y_1& x_2 + y_4& x_3 + y_7\\
x_1 + y_2& y_5& y_8\\
y_3& y_6& y_9},
\ b=\pmatrix{y_1\\ y_2\\ y_3}.
\end{align*}
Now we want to find the Jacobian of $f$ – i.e. the partial derivative of $f$ w.r.t. $x$.

One way to do this is to multiply the two matrices and then multiply that by the vector, creating one $3\times1$ vector in which each element is an algebraic expression resulting from matrix multiplication. The partial derivative could then be computed per element to form a $3\times3$ Jacobian. This would be feasible in the above example, but the one I'm working is a lot more complicated (and so I would also have to look for patterns in order to simplify it afterwards).

I was wanting to try to use the chain rule and/or the product rule for partial derivatives if possible. However, with the product rule you end up with $A' Bb + AB' b + ABb'$, where each derivative is w.r.t. the vector $x$. I understand that the derivative of a matrix w.r.t. a vector is actually a 3rd order tensor, which is not easy to deal with. If this is not correct, the other terms still have to evaluate to matrices in order for matrix addition to be valid. If I use the chain rule instead, I still end up with the derivative of a matrix wrt a vector.

Is there an easier way to break down a matrix calculus problem like this? I've scoured the web and cannot seem to find a good direction.

Best Answer

If 3rd order tensors are undesirable, use partial derivatives, which do not change the algebraic nature of the objects differentiated. The product and chain rules apply to them as well. Thus, for $i=1,2,3$ $$ \frac{\partial f }{\partial x_i} = \frac{\partial A}{\partial x_i} Bb + A\frac{\partial B}{\partial x_i} b + AB\frac{\partial b}{\partial x_i} $$ where all terms are matrices or vectors, and the entire expression is a vector. Then put these three vectors as columns into a matrix, and you have the Jacobian of $f$.

Related Question