The orthogonal standard form of antisymmetric matrix

linear algebramatricesorthogonality

Matrix $A=\left( a_{ij} \right) \in M_n\left( \mathbb{R} \right)
$
,if $A$ is an antisymmetric matrix,then $a_{ij}=-a_{ji}$.

If $B$ is a real symmetric matrix,then exist $P\in O_n\left( \mathbb{R} \right) $,$P^TBP=\mathrm{diag}\left\{ \lambda _1,…,\lambda _n \right\}
$
,$\mathrm{diag}\left\{ \lambda _1,…,\lambda _n \right\} $ is called the orthogonal standard form of $B$

I want to prove that if $A$ is a antisymmetric matrix, then exist $P\in O_n\left( \mathbb{R} \right)
$
,$P^TAP=\mathrm{diag}\left\{ \left( \begin{matrix}
0& a_1\\
-a_1& 0\\
\end{matrix} \right) ,…,\left( \begin{matrix}
0& a_m\\
-a_m& 0\\
\end{matrix} \right) ,0,..,.0 \right\}
$
.the block diagonal matrix called the orthogonal standard form of $A$.

There is my method:

I want to prove it by induction,$A$ is an antisymmetric matrix,suppose
$A=\left( \begin{matrix}
A_1& \alpha _1\\
-{\alpha _1}^T& 0\\
\end{matrix} \right)$

,where $A_1\in \,\,M_{n-1}\left( \mathbb{R} \right)
$
,$A_1$ is an antisymmetric matrix,by induction,there exist $P=\left( \begin{matrix}
P_1& \\
& 1\\
\end{matrix} \right) \in O_n\left( \mathbb{R} \right)
$
$s.t.$ $P^TAP=\left( \begin{matrix}
{P_1}^TA_1P_1& {P_1}^T\alpha _1\\
-{\alpha _1}^TP_1& 0\\
\end{matrix} \right)
$
.

where ${P_1}^TA_1P_1
$
become the standard form,

but I don't know how to deal with ${P_1}^T\alpha _1
$
and $-{\alpha _1}^TP_1
$
.

Thank you for sharing your mind.

Best Answer

On your thoughts and attempt

I like your idea of using induction. It feels natural, especially given the decomposition you produced into the $n-1 \times n-1$ case and used the induction hypothesis.

However, I'm not sure that induction would work, the way you are doing it. I'm not saying it wouldn't work with certainty, but there is a hint of confidence. Here's my reason why : if you think about the proof of the spectral theorem, then whether by induction or not, there's simply no way you can get out without the mention of eigenvectors. I don't see where eigenvectors enter the equation in this approach as of yet.

Furthermore, in your proof, even though you have orthogonalized $A_1$, the orthogonalization has to be done in such a way that it can also "fit" the last column/row properly (namely, $P_1^T\alpha_1 = 0)$, and this isn't guaranteed by the induction hypothesis. You're stuck because your choice of $P_1$ currently isn't flexible enough to also include $P_1^T \alpha_1 = 0$.


On how one can think about this question

All in all, it seems that the best way to think about this question is to reduce either its proof method or its statement to that for real symmetric matrices. Recall that $A$ being antisymmetric implies (by definition) that $A^T = -A$, which by squaring both sides shows that $(A^T)^2 = A^2$ i.e. that $A^2$ is a symmetric matrix. Now, could diagonalization of $A^2$ help? Also, the proof technique was induction so can induction play a role here?

Well, the problem (and solution!) surfaces when you try to push an inductive argument through. Say you chose to start with an eigenvector $v$ of $A^2$, so that $A^2v = \lambda v$. Then one can write this as $A(Av) = \lambda v$ and try and take $w = \frac{1}{\sqrt{\lambda}}Av$ so that $Aw = \sqrt{\lambda} v$ and $Av = \sqrt{\lambda}w$.

This comes close : as long as $w \neq cv$ for some $c$, in any basis extending $\{w,v\}$ the matrix for $A$ will contain the submatrix $\begin{pmatrix} 0 & \sqrt{\lambda} \\ \sqrt{\lambda} & 0\end{pmatrix}$. However, this is not the submatrix we want. We also assumed that $\lambda \geq 0$, which isn't easy to prove (although true!) We also need $w^Tv = 0$ otherwise the decomposition won't be orthogonal. Too many problems!

This does give us an idea, though.

  • Using $A$, create a matrix that is symmetric.

  • Use an eigenvector equation of that matrix to get an orthogonal pair $w_1,w_2$ satisfying $Aw_1 = -aw_2$ and $Aw_2 = -aw_1$. In a basis containing $w_1,w_2$, the matrix of $A$ will contain the submatrix $\begin{pmatrix}0 & -a \\ a & 0\end{pmatrix}$. The removal of $w_1,w_2$ will then lead to the inductive hypothesis being applicable.

That matrix cannot be $A^2$, unfortunately. We can fall back on a more conventional and far more suitable option : that is, the matrix $A^TA$. Indeed, this is always a symmetric matrix, so hopefully its eigenvectors have something to offer up?

The crucial idea is that one can use $A^TA$ for this purpose. I will not present a proof by induction, but I will do the key step, after which one can finish either with or without induction.


Following the proof technique : the required $w_1,w_2$ and eigenspace characterization

The idea is quite simple at the moment : We know that $A^TA$ is symmetric, but there's more. In fact, if $\lambda$ is an eigenvalue of $A^TA$, then there exists $v$ such that $A^TAv = \lambda v$. Multiplying by $v^T$ from the left, $v^TA^TAv = \lambda v^Tv$, so that $\|Av\|_2^2 = \lambda \|v\|_2^2$ where $\|w\|_2^2 = \sum_{i=1}^n w_i^2$ is the sum of squares of entries of $w$. It follows that $\lambda \geq 0$, so $\lambda$ is in fact a non-negative real number.

Lemma : One can find an orthonormal basis for $A^TA$ of the form $\{w_{11},w_{12},w_{21},w_{22},\ldots,w_{k1},w_{k2},u_1,u_2,\ldots,u_{n-2k}\}$ where $Au_{j} = 0$ for $j=1,\ldots,n-2k$ and for each $i=1,\ldots,k$, there exists $a_i \geq 0$ such that $$w_{i1}^Tw_{i2}=0, Aw_{i1} = -aw_{i2}, Aw_{i2} = aw_{i1}$$

Step 1 : Just the eigenvalue equation

To prove this, as usual, let's start with the eigenvector equation $A^TAv = \lambda v$, and remember that the existence of an eigenvector is true by the existence of orthogonal standard form (also called the spectral theorem for symmetric matrices, by the way) for $A^TA$.

Suppose that $\lambda >0$. Here's the magic : because $A^T = -A$, we get $$ A^TA v= \lambda v \implies A^T (Av) = \lambda v \implies A(-Av) = \lambda v $$ Now suppose that $w_1 = v$ and $w_2 = -\frac 1{\sqrt{\lambda}}Av$. Then, we have $$Aw_1 = Av = -\sqrt{\lambda}w_2$$ and $$Aw_2 = -A^Tw_2 = \sqrt{\lambda} Av = \sqrt{\lambda}w_1$$

Excellent! We now have $w_1,w_2$ as desired. There's only one thing that we need to check which is quite surprising at first glance. It is that $w_1^Tw_2 = 0$. This ensures that the orthogonal decomposition goes through.

To show that $w_1^Tw_2 = 0$, use their definitions in relation to $A$. $$ w_1^Tw_2 = \frac{-1}{\lambda}(Aw_2)^T(Aw_1) = \frac{-1}{\lambda}(w_2^T)(A^TAw_1) = -w_2^Tw_1 = -w_1^Tw_2 $$ which shows that $w_1^Tw_2 = 0$, as desired. There is another surprise in the bag that is important enough to mention : note that $AT^Aw_2 = \lambda w_2$. Therefore, we started with $A^TAv = \lambda v$ and ended up with an orthogonal $w_2$ which is also an eigenvector of $A^TA$ with eigenvalue $\lambda$.

We haven't dealt with the case that $v$ is an eigenvector with $\lambda =0$. This is much, much easier though. Indeed, if $A^TAv = 0$ then by multiplying on both sides by $v^T$, $\|Av\|_2^2 = 0$, so $Av =0$. That is, $v$ is also an eigenvector of $A$.

Step 2 : Going into the eigenspace dimensions

We proved in step $1$ that if $\lambda \neq 0$ is an eigenvalue of $A^TA$ then every eigenvector $w_1=v$ can be paired with another eigenvector $w_2 = -\frac 1{\sqrt \lambda} Av$. Note that if I replace $w_1$ with $w_2$ then I get back $w_1$ itself. This allows us to deduce something easily.

Indeed, let $E_{\lambda} = \{v : A^TAv = \lambda v\}$ be the eigenspace of $\lambda$. Choosing a vector $w_{11,\lambda}=v$ , creating $w_{12,\lambda}= -\frac{1}{\sqrt\lambda}Aw_{11,\lambda}$ and "removing" these by letting $E_{\lambda , 1} = \{v \in E_{\lambda} : v^Tw_{11}=v^Tw_{12}=0\}$ (i.e. going to the orthogonal complement of the span of $\{w_1,w_2\}$ in $E_{\lambda}$), we can take a new element $w_{21,\lambda} \in E_{\lambda,1}$, then get $w_{22,\lambda}$ etc.

By the above reasoning (carefully formulated as an inductive argument if necessary), we get that $ E_{\lambda}$ is the span of the orthogonal set of vectors $S_{\lambda} = \{w_{11,\lambda},w_{12,\lambda},\ldots,w_{k_{\lambda}1,\lambda},w_{k_{\lambda}2,\lambda}\}$ where $w_{i1,\lambda},w_{i2,\lambda}$ satisfy the relation in the paragraph above for $i=1,2,\ldots,k_{\lambda}$ and $2k_{\lambda} = \dim E_{\lambda}$.

This can be done for all $\lambda \neq 0$ which are the eigenvalues of $A^TA$ : we can get a set $S_{\lambda}$ consisting of orthogonal vectors , such that their span equals $E_{\lambda}$. Furthermore, if $\lambda \neq \mu$ then every vector in $S_{\lambda}$ is orthogonal to every vector in $S_{\mu}$ because eigenvectors with different eigenvalues are always orthogonal.

Finally, if $\lambda =0$ then we just take an orthogonal basis $S_0 = \{u_1,u_2,\ldots,u_{n-2k}\}$ of eigenvectors with eigenvalue $0$. These are orthogonal to all the $S_{\lambda}, \lambda \neq 0$ because of the same remark as in the last sentence of the last paragraph.

Finally, putting $S = S_{0} \bigcup_{\lambda \text{ distinct e.v. of $A^TA$ }} S_{\lambda}$ , $S$ is an orthogonal basis satisfying the conditions we wished for, with $u_1,\ldots,u_{n-2k} = S_0$.

Finally, to make the basis orthonormal, just divide every element of $S$ by its norm so that every element of $S$ has norm $1$. Then $S$ is as required.


Finishing the proof

The proof is now obvious : Let $P$ be the orthogonal matrix whose columns are the elements of $S$, transposed. Let $S = \{w_{11},w_{12},w_{21},w_{22},\ldots,w_{k1},w_{k2},u_1,u_2,\ldots,u_{n-2k}\}$ for convenience (in that order, so that the matrix is presented in that order). Then, $P^TP = I$ is clear because $S$ consists of orthogonal vectors. Furthermore, when we write the matrix of $A$ in this basis, then every element of the matrix $P^TAP$ is of the form $s_1^TAs_2$ where $s_1,s_2$ are elements of $S$. We only have a few possibilities.

  • If either $s_1 = u_i$ for some $i$ or $u_2 = j $for some $j$ then $s_1^TAs_2 = 0$.

  • If $s_1 = w_{i1}$ and $s_{2} =w_{j1}$ for $i \neq j$ then by the construction in the lemma, $s_1^TAs_2 = w_{i1}^TAw_{j1} = -\sqrt{\lambda}w_{i1}^Tw_{j2} = 0$ (where $\lambda$ is the eigenvalue for $s_2$) as $w_{j2}$ is orthogonal to $w_{i1}$ if $i \neq j$. This also works if $s_1 = w_{i*}, s_2 = w_{j*}$ where $i \neq j$ and $*$ can be $1$ or $2$ in each index.

  • If $s_1 = w_{i1}$ and $s_2 = w_{i2}$ for some $i$, then by the lemma, $s_1^TAs_2 = -\sqrt{\lambda}s_1^Ts_1 = -\sqrt{\lambda}$ by orthonormality. Similarly, $s_2^TAs_1 = \sqrt{\lambda}$, and $s_1^TAs_1 = s_2^TAs_2= 0$.

The calculations we have done above in every case show clearly that in the ordered basis $$ S = \{w_{11},w_{12},\ldots,w_{k1},w_{k2},u_1,u_2,\ldots,u_{n-2k}\} $$ The matrix of $A$ is a block-diagonal matrix. First appear the $2 \times 2$ blocks : for each $w_{i1},w_{i2}$ pair with eigenvalue $\lambda$, you have $\begin{pmatrix}0 & -\sqrt{\lambda} \\ \sqrt{\lambda} & 0\end{pmatrix}$. Then the $1 \times 1$ blocks are just zero matrices corresponding to each $u_i$. The off-block-diagonal relations are all $0$ by construction as well.

This proves the result.

Note that one can use induction at different points of the proof. Perhaps one can just factor out a pair $w_{11},w_{12}$ and/or a vector $u_1$, remove its contribution from $A$ by choosing restricting $A$ to the complement of these vectors and applying induction. However, the above proof is quite superior in that it clearly illustrates how one can even construct such a matrix $P$, given an orthogonal standard form for $A^TA$.