The Hessian matrix of $f\left(x\right)=\left\langle Ax,x\right\rangle \cdot\left\langle Bx,x\right\rangle $

derivativeshessian-matrixmatrices

I'm trying to understand what is the Hessian matrix of $f\colon\mathbb{R}^{n}\to\mathbb{R}$
defined by $f\left(x\right)=\left\langle Ax,x\right\rangle \cdot\left\langle Bx,x\right\rangle $
where $A,B$ are symetric $n\times n$ matrices. What I know is that
if we let $g\left(x\right)=\left\langle Ax,x\right\rangle $ and $h\left(x\right)=\left\langle Bx,x\right\rangle $
then $\nabla g\left(x\right)=2Ax,\nabla h\left(x\right)=2Bx$ and
$\nabla^{2}g\left(x\right)=2A,\nabla^{2}h\left(x\right)=2B$. Also
by the product rule we have $\left(fg\right)'=f'g+fg'$ which then
gives us
\begin{align*}
\left(fg\right)'' & =f''g+f'g'+f'g'+fg''=\\
& =f''g+2f'g'+fg''
\end{align*}

Regarding $\nabla f\left(x\right)$ as a column vector, I tried to
implement this on the given $f\left(x\right)$ and what I got is
$$
\nabla f\left(x\right)=\nabla\left(gh\right)\left(x\right)=2Ax\cdot\left\langle Bx,x\right\rangle +\left\langle Ax,x\right\rangle \cdot2Bx
$$

which seems to have worked fine with a concrete example. But then
I got to the Hessian:
\begin{align*}
\nabla^{2}f\left(x\right) & =\nabla^{2}\left(gh\right)\left(x\right)=2A\cdot\left\langle Bx,x\right\rangle +\underset{{\scriptscriptstyle \left(\ast\right)}}{\underbrace{2Ax\cdot2Bx}}+\underset{{\scriptscriptstyle \left(\ast\right)}}{\underbrace{2Ax\cdot2Bx}}+\left\langle Ax,x\right\rangle \cdot2B=\\
& =2A\cdot\left\langle Bx,x\right\rangle +\underset{{\scriptscriptstyle \left(\ast\right)}}{\underbrace{8Ax\cdot Bx}}+\left\langle Ax,x\right\rangle \cdot2B
\end{align*}

Now as $Ax,Bx$ in $\left(\ast\right)$ are both column vectors I
thought I should try this instead
$$
\nabla^{2}f\left(x\right)=2A\cdot\left\langle Bx,x\right\rangle +\underset{{\scriptscriptstyle \left(\ast\ast\right)}}{\underbrace{8Ax\cdot\left(Bx\right)^{T}}}+\left\langle Ax,x\right\rangle \cdot2B
$$

But that didn't work with my example.

In general I feel the whole process of differentiating functions that
are represented by matrices is quite a mystery to me when it comes
to where I should transpose and so. Any help is appreciated. Thanks
in advance.

Best Answer

We can write formulas for $f_i$ and $f_{ij}$ (individual first and second partial derivatives) of $f$: $$ f_i(x) = g_i(x)h(x) + g(x)h_i(x) $$ and $$ f_{ij}(x) = g_{ij}(x)h(x) + g_i(x)h_j(x) + g_j(x)h_i(x) + g(x)h_{ij}(x). $$

We can also write the quadratic form $x^{\textrm{T}} A x$ in a form that is easier to differentiate: $$ g(x) = \sum_i \sum_j A_{ij}x_i x_j $$ where $A_{ij}=A_{ji}$ is row $i$, column $j$ of $A$ and $x_i$ is the $i$th variable. So $$ \begin{align} g_k(x) &= \sum_i \sum_j A_{ij}(\delta_{ik}x_j + x_i\delta_{jk}) \\ &= \sum_j A_{kj} x_j + \sum_i A_{ik}x_i \\ &= \sum_i 2A_{ik}x_i \\ &= 2(A_{k*} \cdot x) \end{align} $$ where $\delta_{ij}$ is the Kronecker delta function and $A_{k*}$ is the $k$th row of $A$. The second partial derivative with respect to variables $k$ and $l$ is $$ g_{kl}(x) = \sum_i 2A_{ik}\delta_{il} = 2A_{kl}. $$

Using these formulas for the partial derivatives of $g$ (and $h$) gives the desired result: $$ f_{ij}(x) = 2A_{ij}h(x) + 4(A_{i*}\cdot x)(B_{j*}\cdot x) + 4(A_{j*}\cdot x)(B_{i*}\cdot x) + 2B_{ij}g(x). $$

I derived the identities $\nabla g = 2Ax$ and $\nabla^2 g = 2A$ in component form and then used this to compute the individual components of the Hessian of $f$. The point is that when working with matrices, it is often easier to break everything down into individual components. For example, in a matrix product $PQ$, you would work with $(PQ)_{ij}$ instead of the matrix product itself.

Related Question