The general expression for the gradient of $\dfrac{1}{2} x^TA(x)x$

jacobianmatricesmultivariable-calculusoptimizationvector analysis

Let $A$ be some matrix function of $x$, $A:\mathbb{R}^n \to \mathbb{R}^{n\times n}$.

My question is, is there some general formula for the gradient of

$$f(x) =\dfrac{1}{2} x^TA(x)x$$

I only know some special cases:

  • $A$ is a constant, symmetric matrix, then $\nabla f(x) = Ax$

    $A$ is a constant, assymmetric matrix, then $\nabla f(x) =
    \dfrac{1}{2} (A^T+A)x$

    $A$ is $\text{diag}(x)$, then $\nabla f(x) = \dfrac{3}{2} Ax$

I'm stuck because I don't understand how to find a simpler expression for the derivative $D[A(x)x]$, where,

\begin{align*} \nabla f(x) &= \nabla \dfrac{1}{2} x^TA(x)x\\ &= \nabla \dfrac{1}{2} x^Tg(x) \\ &= \dfrac{1}{2}A(x)x + \dfrac{1}{2}x^TD[A(x)x]
\end{align*}

Best Answer

I assume you mean $A: \mathbb R^n \to \mathbb R^{n\times n}$ so that $A$ takes in a vector and gives a matrix. I'll operate under this assumption.

We see $$f(x) = \frac 1 2\sum_{i,j = 1}^n x_ix_jA_{ij}(x).$$ Now fix $k \in \{ 1,\ldots, n\}$, Then by the product rule $$\frac{\partial f }{\partial x_k}(x) = \frac{1}{2} \sum^n_{i,j=1} \left( x_ix_j\frac{\partial A_{ij}}{\partial x_k}(x) + \delta_{ik} x_j A_{ij}(x) + \delta_{jk} x_i A_{ij}(x)\right) $$ where $\delta_{ab} = \left\{\begin{smallmatrix} 1, & a = b, \\ 0, & a \neq b.\end{smallmatrix} \right.$ Resolving these $\delta$'s, we see $$\frac{\partial f }{\partial x_k}(x) = \frac{1}{2} \left(\sum^n_{i,j=1} x_ix_j\frac{\partial A_{ij}}{\partial x_k}(x) \right) + \frac 1 2 \left( \sum_{j=1}^n x_j A_{kj}(x)\right) + \frac 1 2\left( \sum^n_{i=1} x_i A_{ik}(x)\right).$$ Now the latter two terms turn into $\frac 1 2(A(x) + A^T(x))x$ when you put everything together. For the first term, you sort of need to invent notation (actually we'll use tensor notation), and what you have is $$\frac 1 2 x^T(\nabla \circ A)(x)x$$ where $\nabla \circ A: \mathbb R^n \to \mathbb R^{n\times n \times n}$ is given by $(\nabla \circ A)_{ijk}(x) = \frac{\partial A_{ij}}{\partial x_k}(x).$ Note that $x^T (\nabla\circ A)(x) x \in \mathbb R^n$ for any $x \in \mathbb R^n$ and by convention the inner products operate on the first two dimensions of $\nabla \circ A$. Then you can write $$\nabla f(x) = \frac 1 2 (x^T(\nabla\circ A)(x) x + A(x)x + A^T(x) x).$$