Fréchet derivative of a matrix expression

frechet-derivativelinear algebramatricesmatrix-calculus

Suppose $h(Q) = Q^{T} A Q$, then the Fréchet derivative is given by $D_{h} (Q) [H] = H^{T} A Q + Q^{T} A H$. I am bit unsure about this so-called Fréchet derivative is obtained.

I would have just said:

$(h(Q))'= \dot{Q^{T}}AQ + Q^{T}\dot{A}Q + Q^{T} A \dot{Q}$, and then try to find an expression, but I haven't been able to come up with one.

Any help is appreciated.

Best Answer

As I've stated in my comments, I think your primary confusion is over the "data type" of the Frechet derivative. I've addressed this concern in the comments on your question. Here, I will talk primarily about how one computes the Frechet derivative.

There are several equivalent definitions of the Frechet derivative that can be used, but I prefer the following. For normed vector spaces $U,V$ and a function $f:U \to V$, the Frechet derivative of $f$ at the point $x \in U$ (written as $Df(x)$) is the unique linear map $Df(x) = A:U \to V$ for which $$ f(x + h) = f(x) + A(h) + o(\|h\|) $$ for $H \in U$. Here, $o(\|h\|)$ (which is little-o notation) denotes a sum of "higher order terms". In other words, at any given $x \in U$, the remainder $R(h) = f(x+h) - f(x) - A(h)$ satisfies $$ \lim_{h \to 0}\frac{R(h)}{\|h\|} = 0. $$ Note that $A(h)$ can be written as $[Df(x)](h)$ or, as you have writen it, in the form $Df(x)[h]$.

With that in mind, you seem to have a function $h:\Bbb R^{n \times m} \to \Bbb R^{n \times n}$ of the form $$ h(Q) = Q^TAQ, $$ for some $m \times m$ matrix $A$. At any given $Q$, $Dh(Q)$ will be a linear map that takes an $n \times m$ input and produces an $n \times n$ output. That is, for any $n \times m$ matrix $H$, $Dh(Q)[H]$ will be an $n \times n$ matrix.

My preferred approach to finding this linear map is to "expand" the expression $h(Q + H)$. In this case, this simply amounts to expanding a product of matrices; more generally, we might use something akin to a "linearization" or "Taylor series" expansion. We have \begin{align} h(Q + H) &= (Q + H)^TA(Q + H) \\ & = \underbrace{Q^TAQ}_{h(Q)} + H^TAQ + Q^TAH + \underbrace{H^TAH}_{o(\|H\|)}. \end{align} The initial term of the sum is equal to $h(Q)$. The final term of the sum involves a multiplication of two $H$'s, which means that its norm $\|H^TAH\|$ scales with $\|H\|^2$, which means that as $H \to 0$, this term approaches zero faster than $\|H\|$. The piece that remains is $H^TAQ + Q^TAH$, which depends linearly on $H$. That is, $\mathcal L(H) = H^TAQ + Q^TAH$ is a linear map. With this linear map, we have $$ h(Q + H) = h(Q) + \mathcal L(H) + o(\|H\|). $$ So, by definition, $\mathcal L = Dh(Q)$. That is, we have $$ Dh(Q)[H] = H^TAQ + Q^TAH. $$

Related Question