[Math] Solving a quadratic equation for an hermitian matrix

hermitianmatrix equationsra.rings-and-algebras

I am looking for a procedure to find solution(s) for a square matrix equation

$H^T H = S$

where $H = H^\dagger$ is a hermitian ($n\times n$) matrix and $S$ is a given symmetric complex matrix. Due to hermiticity of $H$, $S$ should satisfy $n$ conditions $(\operatorname{im}(\operatorname{tr}(S^i))=0,\quad i = 1,…,n)$.

I am interested in simple solutions for small $n=3$ matrices. For $n=2$, this can be solved by an explicit parametrization of $H$ which leads to a quadratic equation giving four solutions. Since the problem is similar to taking a square root of a matrix, presumably there are $2^n$ solutions for this problem, too.

note: This question is perhaps similar to this one. Here, the equation is simpler but applies to complex, not real matrices.

Best Answer

The first thing to keep in mind is that having $\mathrm{Im}\bigl(\mathrm{tr}(S^i)\bigr)=0$ for $i=1,\ldots,n$ is not sufficient to guarantee that $S = H^TH$ has a solution with $H$ Hermitian, even when $n=2$. For example, consider $$ S = \begin{pmatrix}1&i\\ i&-1\end{pmatrix}, $$ which has $\mathrm{tr}(S) = \mathrm{tr}(S^2) = 0$. It is easy to show, however, that $S$ is not of the form $H^TH$ for any $2$-by-$2$ Hermitian matrix $H$. Another example satisfying the condition $\mathrm{Im}\bigl(\mathrm{tr}(S)\bigr)=\mathrm{Im}\bigl(\mathrm{tr}(S^2)\bigr)=0$ for which it is easy to show there is no solution is $$ S = \begin{pmatrix}-1&0\\ 0&a\end{pmatrix}, $$ where $a$ is any real number other than $-1$.

On the other hand $H^TH=-I_2$ has a $1$-parameter family of solutions $$ H = \begin{pmatrix}s&ic\\ -ic&s\end{pmatrix}, $$ where $c$ and $s$ are any real numbers satisfying $c^2-s^2=1$, while $H^TH=I_2$ has two $1$-parameter families of solutions, the first being $$ H = \begin{pmatrix}s&c\\ c&-s\end{pmatrix} $$ where $c^2+s^2=1$ and the second being $$ H = \begin{pmatrix}c&is\\ -is&c\end{pmatrix}, $$ where $c$ and $s$ are any real numbers satisfying $c^2-s^2=1$.

These examples show that, even when $n=2$, neither invertibility nor having an orthogonal basis of eigenvectors guarantees solvability while, sometimes, there are an infinite number of solutions.

The second thing to keep in mind is that there is a natural $\mathrm{O}(n,\mathbb{C})$-equivariance built in to the problem, one that is important in understanding the equation that needs to be solved: Let $S_n(\mathbb{C})$ denote the vector space of symmetric $n$-by-$n$ matrices with complex entries (which has real dimension $n^2{+}n$), and let $H_n$ denote the (real) vector space of Hermitian symmetric $n$-by-$n$ matrices (which has real dimension $n^2$). The OP's question then deals with understanding the preimages of elements of $S_n(\mathbb{C})$ with respect to the quadratic mapping $\sigma:H_n\to S_n(\mathbb{C})$ defined by $$ \sigma(H) = H^TH. $$ Now, $\sigma$ has the $\mathrm{O}(n,\mathbb{C})$-equivariance $$ \sigma(A\cdot H) = A\cdot \sigma(H) $$ where, for any $A\in\mathrm{O}(n,\mathbb{C})$, i.e., $A$ satisfying $A^TA = I_n$, one has $$ A\cdot H = \bar A\ H\ A^T\quad\text{for}\ H\in H_n \qquad\text{and}\qquad A\cdot S = A\ S\ A^T\quad\text{for}\ S\in S_n(\mathbb{C}). $$

Thus, the problem can be naturally posed as a question about $\mathrm{O}(n,\mathbb{C})$-orbits in $S_n(\mathbb{C})$. While the 'generic' orbit is easy to understand, and the structure of the 'non-generic' orbits is known, the latter can be somewhat complicated. The 'generic' orbits are described as follows: Say that $S\in S_n(\mathbb{C})$ is generic if $S$ has $n$ distinct eigenvalues, i.e., if there exists an $S$-eigenbasis $(v_1,\ldots, v_n)$ of $\mathbb{C}^n$ with $Sv_i = \lambda_i v_i$ where the $\lambda_i\in \mathbb{C}$ satisfy $\lambda_i\not=\lambda_j$ when $1\le i<j\le n$. In this case, as usual, one has ${v_i}^Tv_j = 0$ for $i\not=j$ while ${v_i}^Tv_i\not=0$. One can, by scaling, assume that ${v_i}^Tv_i = 1$ for all $i$, and this determines the $S$-eigenbasis $(v_1,\ldots, v_n)$ up to the individual signs of the $v_i$. In other words, when $S$ is generic, one can write $S = V^T\Lambda V$ where $V^TV=I_n$ while $\Lambda = \mathrm{diag}(\lambda_1,\ldots,\lambda_n)$ and, once the eigenvalues are fixed in some order, $V$ is unique up to premultiplication by an element of the form $E = \mathrm{diag}(\epsilon_1,\ldots,\epsilon_n)$ where ${\epsilon_i}^2=1$. (Such elements $E$ form a finite subgroup of $\mathrm{O}(n,\mathbb{C})$ of order $2^n$.)

Now, for dimension reasons, $\sigma$ obviously cannot be surjective; in fact, the image of $\sigma$ must have real codimension at least $n$ in $S_n(\mathbb{C})$. Indeed, as the OP points out, the image does satisfy $n$ real equations, i.e., $\mathrm{Im}\bigl(\mathrm{tr}(S^i)\bigr)=0$ for $i=1,\ldots,n$. This follows because $\bar S = HH^T$, and so one has $$ \mathrm{tr}(S^k)=\mathrm{tr}\bigl((H^TH)^k\bigr) = \mathrm{tr}\bigl(H^T(HH^T)^{k-1}H\bigr) =\mathrm{tr}\bigl((HH^T)^{k-1}HH^T\bigr) = \mathrm{tr}({\bar S}^k) =\overline{\mathrm{tr}(S^k)}. $$ However, as the above examples with $n=2$ show, these $n$ equations (which merely establish that the characteristic polynomial of $S$ has real coefficients) are not sufficient to characterize the image of $\sigma$.

Now, suppose that $S$ is generic, with distinct eigenvalues $\lambda_1,\ldots,\lambda_n$ and write $S = V^T\Lambda V$ as above. (One finds $\Lambda$ and $V$ by the usual process of taking the $\lambda_i$ to be the roots of the characteristic polynomial of $S$ and then solving the linear equations $(S-\lambda_iI)v_i=0$ and normalizing the resulting $v_i$). If $H\in H_n$ satisfies $H^TH=S$, then by acting on the pair $(S,H)$ by $A = V\in\mathrm{O}(n,\mathbb{C})$, one is reduced to the case $(V{\cdot}S, V{\cdot}H) = (\Lambda, V{\cdot}H)$.

Thus, it suffices, in the generic case, to describe the solutions to $H^TH = \Lambda$ for $H\in H_n$, where $\Lambda=\mathrm{diag}(\lambda_1,\ldots,\lambda_n)$ with the $\lambda_i$ distinct.

To do this, write $H = a + i b$ where $a=a^T$ and $b = -b^T$ are real. Then $$ \Lambda = H^TH = (a-ib)(a+ib) = a^2 + b^2 + i(ab-ba). $$ Thus, both $a^2{+}b^2$ and $ab{-}ba$ must be diagonal. Pursuing this, one sees that the assumption that the $\lambda_i$ are distinct implies that $a$ and $b$ can be simultaneously block-diagonalized, where the blocks have size either $1$ (for each real eigenvalue) or size $2$ (for each conjugate pair of non-real eigenvalues). In particular, the real eigenvalues must all be nonnegative (at most one of which can be $0$), and the non-real eigenvalues correspond to a $2$-by-$2$ submatrix of the form $$ S' = \begin{pmatrix}\lambda^2&0\\0&{\bar\lambda}^2\end{pmatrix} $$ where $\lambda^2\not={\bar\lambda}^2$. There are two solutions $H'$ for this, namely $$ H' = \pm\begin{pmatrix}0&\bar\lambda\\ \lambda&0\end{pmatrix}. $$ Assembling these blocks, one gets the complete solutions $S$, which are $2^k$ in number, where $k$ is the number of real eigenvalues of $S$ plus one-half the number of non-real eigenvalues of $S$ minus the number of $0$ eigenvalues of $S$ (if any).

As the above discussion when $n=2$ shows, the nongeneric cases are more complicated. When $n=2$, if $S$ has a real double eigenvalue $\lambda$, then it is of the form $$ S = \begin{pmatrix}\lambda + p & ip\\ ip & \lambda - p\end{pmatrix} $$ for some $p\in\mathbb{C}$. If $\lambda\le0$, then there is no solution unless $p=0$, in which case, the solutions lie in a $1$-parameter family $$ H = \begin{pmatrix}c & is\\ -is & c\end{pmatrix} $$ where $c$ and $s$ are real numbers satisfying $c^2-s^2 = \lambda$. If $\lambda>0$ and $p=0$, then there is, in addition to the above solution, another $1$-parameter family $$ H = \begin{pmatrix}c & s\\ s & -c\end{pmatrix} $$ where $c$ and $s$ are real numbers satisfying $c^2+s^2 = \lambda$. If $\lambda>0$ and $p$ is not zero, then there are exactly $2$ solutions: Let $r>0$ satisfy $r^2 = |p|^2/(4\lambda)>0$, and set $u-iv = p/(2r)$, then $\lambda = u^2+v^2$, and the two solutions are $$ H = \pm\begin{pmatrix} r+u & v+ir\\ v-ir & r-u\end{pmatrix}. $$

There is a similar analysis of the nongeneric cases for higher $n$ (particularly in the case $n=3$, which the OP wanted to understand), and I will leave this to the interested reader.

Related Question