Second Partial Derivative Test for a Matrix Valued Function

derivativesmatricesmatrix-calculus

Let $X\in\mathbb{R}^{n\times m}$ and $f(X)=\|X^TX-I_m\|^2_F$ (the Frobenius matrix norm). I was able to derive the derivative w.r.t X that is $df_X(A)=\langle 4 (X^TX-I_m)X^T,A\rangle$ (the Frobenius inner product). Therefore, $\nabla_XF=4 (X^TX-I_m)X^T$, and $F$ has two critical points at $X=0$ and $X^TX=I_m$.

I want to show that $X=0$ is a local maximum (not sure maybe it is a saddle point). The Hessian becomes a 4th order tensor and very cumbersome to derive. I tried to find the first order approximation of the gradient which turned out to be $$\nabla_XF(X+H)\simeq\nabla_XF(X)+4(H^TXX^T+X^THX^T+X^TXH^T-H^T).$$
However, I don't know how to use this to examine the function at $X=0$. How can I see what type of critical point $X=0$ is?

Best Answer

Method 1 (with the Hessian): the quadratic form is

$D^2f_X(H,H)=4tr(H^TH(X^TX-I_m)+H^TXH^TX+H^TXX^TH)$

where $H\in M_{n,m}$.

When $X=0$, $D^2f_0(H,H)=-4tr(H^TH)$, that is $<0$ when $H\not=0$. Then our quadratic form is $<0$ and, since $X=0$ is a critical point of $f$, it is also a local maximum of $f$.

Method 2. Let $spectrum(X^TX)=(\sigma_i^2)$ (the singular values of $X$). Then

$f(X)=tr((X^TX-I)^2)=\sum_{i\leq m}(\sigma_i^2-1)^2$. When $X$ is in a neighborhood of $0$, then the $\sigma_i^2$ are small and

$f(X)\approx \sum_{i\leq m}(1-2\sigma_i^2)=m-2\sum_i\sigma_i^2\leq m=f(0)$ and we are done.

Remark. Of course, there are other critical points than $X=0$ and $X$ pseudo orthogonal.