Visualising geometrically $\lambda=\pm 1$ eigenvalues for orthogonal transformations

diagonalizationeigenvalues-eigenvectorslinear algebralinear-transformationsorthogonal matrices

The sole purpose of this post is to gather geometrical intuition behind why are the eigenvalues of an orthogonal matrix which is diagonalisable over the reals
equal to ±1
. Now I know they both represent rotations or reflections, but what makes $\lambda=\pm 1$ special to make the transformation orthogonal and yet diagonalisable? What am I looking to distinguish if those are the eigenvalues that make it to be diagonalisable?

I imagine a proper subspace in $\mathbb{R}^2$, like a line $V_1$, and it's eigenvector that gets mapped to itself ($\lambda=1$) or to it's opposite ($\lambda=-1$) but I don't know how that makes it diagonalisable, nor I don't see any "rotation" there. Anyone can shine a light on all this with some geometrical inputs?

Best Answer

I think this is a good, but perhaps a loaded, dense question. I want to address certain points just to clarify some possible misconceptions based on your question before I try to adequately answer.

Possible Misconception 1) Let's look at this statement of yours,

... what makes $\lambda = \pm 1$ special to make the transformation orthogonal and yet diagonalizable.

A given matrix having eigenvalues $\pm 1$ does not imply it is orthogonal or even diagonalizable. Consider the matrix $\begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}$. This matrix only has eigenvalue $1$ but cannot be diagonalized and is certainly not orthogonal.

Possible Misconception 2) This also relates to your other example.

Imagine a proper subspace in $\mathbb{R}^2$, like a line $V_1$, and its eigenvector that gets mapped to itself, ... but I don't know how that makes it diagonalizable.

The same matrix that I mentioned before, $\begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}$, has a single eigenvector corresponding to eigenvalue $1$, but is not diagonalizable.

Possible Misconception 3)

Looking at the comment on the post there is this exchange.

Like why are -1 and 1 the only diagonalisable eigenvalues?

If a a given matrix, $A$, is orthogonal (i.e satisfies $A^TA = I$), then it can have eigenvalues not equal to $\pm 1$. Consider the following matrix which is simply rotates the plane by 90 degrees counter clockwise, $$ A = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix} $$

If you solve for the eigenvalues you will get $i$ and $-i$. So it is important to realize sometimes you may have eigevalues $\lambda \in \mathbb{C}$. That is your eigenvalues may be imaginary numbers. Your question was edited to say diagonalizable over the reals however if this was something you were not aware of I think it is conceptually important to the story.

Possible Misconception 4)

You seem to be thinking the specific eigenvalues are what make it diagonalizable. They do not! The fact that an orthogonal matrix, $A$, satisfies $A^TA=I$ is what makes it diagonalizable. There are two primary points about orthogonal matrices.

  1. If a matrix is orthogonal, then it is diagonalizable
  2. If a matrix is orthogonal, then it has eigenvalues with magnitude $1$.

So it is not about the fact that orthogonal matrices sometimes have eigenvalues $\pm 1$ that make them diagonalizable. There are plenty of matrices with eigenvalues $\pm 1$ that are not diagonalizable. The fact that orthogonal matrices are diagonalizable stems from the fact they satisfy this nontrivial equation, $A^TA=I$.

Following this we are left with several points related to your question which I think need to be addressed.

1) What does it mean to be diagonalizable?

A matrix, $A$, is diagonalizable if there exists $n$ linearly independent non-zero vectors that get mapped to a scalar multiple of themselves. That is $n$ linearly independent vectors, $v_i$ satisfy $ Av_i = \lambda_i v_i$. Intuitively stretch the axes.

Important disclaimer is that many times we allow the $\lambda_i$ to be complex so it doesn't look like stretching the axes in $\mathbb{R}$, it looks like stretching them in $\mathbb{C}$.

2) When is a matrix diagonalizable?

A matrix, $A$, is diagonalizable when there exists $n$ linearly independent non-zero vectors. How do we know if there $n$ linearly independent eigenvectors? This question is somewhat hard!

For any matrix you can compute the eigenvalues and see what all the possible eigenvectors are. If there are not enough then the matrix is not diagonalizable.

For example, for the matrix I refer to in Misconception 1), you show it is not diagonalizable by trying to find enough eigenvectors and consequently realizing it is impossible to have enough because the dimension of the null space of $A-I$ is 1 in that particular example.

However there are important special cases where it is easy to tell if a matrix is diagonalizable!

  1. The $n$-by-$n$ matrix, $A$ has $n$ distinct eigenvalues. That is it has $n$ eigenvalues that are all different from each other

  2. When a given $n$-by-$n$ matrix of real values satisfies $A^TA = AA^T$

  3. When a given $n$-by-$n$ matrix of real values is symmetric that is $A=A^T$.

3) Why are orthogonal matrices always diagonalizable?

An orthogonal matrix, $A$, satisfies $A^TA = AA^T = I$. That is the inverse of $A$ is its transpose. Thus the fact it is diagonalizable follows from the second special case listed in the previous section.

Your question seems to indicate this is the place where you are really looking for intuition geometrically. Why does obeying this strange condition $A^TA = AA^T = I$ give us that $A$, is diagonalizable? I will first remark that rigorously this fact follows from the spectral theorem for normal operators. So for full mathematical details you can look at a proof of the spectral theorem for normal operators.

To have some geometric intuition about why $A^TA = AA^T = I$ tells us $A$, is diagonalizable we need some intuition about what $A^T$ is geometrically. Building intuition should come from examples so look at the following few. For all of the following example compute and describe what $\mathbf{B} = \mathbf{A}^T$ does? You should find that in some sense it "undoes the rotations, but does the same stretching on different axes".

Example 1)

We consider the matrix which rotates vectors by $\theta = \frac{\pi}{4}$. Explicity we know that we can write this matrix as, $$ \mathbf{A} = \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\\\ \sin(\theta) & \cos(\theta) \end{bmatrix} = \begin{bmatrix} \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \\\\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{bmatrix} $$

Pictures of example 1

Example 2)

We consider the matrix which rotates vectors by $\theta = \frac{\pi}{4}$ and scales uniformly by a factor of $2$. Explicity we know that we can write this matrix as,

$$ \mathbf{A} = \begin{bmatrix} 2 & 0\\\\ 0 & 2 \end{bmatrix} \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\\\ \sin(\theta) & \cos(\theta) \end{bmatrix} = \begin{bmatrix} 2 & 0\\\\ 0 & 2 \end{bmatrix} \begin{bmatrix} \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \\\\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{bmatrix} $$ Image of example 2

Example 3)

We consider the matrix which scales the $x$-axis by $2$ and $y$-axis by $0$ and rotates vectors by $\theta = \frac{\pi}{4}$. Explicitly we know that we can write this matrix as,

$$ \mathbf{A} = \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\\\ \sin(\theta) & \cos(\theta) \end{bmatrix}\begin{bmatrix} 2 & 0\\\\ 0 & 0 \end{bmatrix} = \begin{bmatrix} \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \\\\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{bmatrix}\begin{bmatrix} 2 & 0\\\\ 0 & 0 \end{bmatrix} $$ Image of example 3

Now if you think about what $A^TA = AA^T = I$ some intuition could be that that since $A^T$ "undoes the rotations of $A$ but does the stretching again", that there is in fact "no stretching or squishing". That is the eigenvalues should have magnitude $1$. Therefore in some intuitive sense $A$ is only "rotating things" and we then know it commutes with $A^T$ which "rotates backwards". This notion vaguely places some constraints on how axes should move but says that since going forward and back look similar it looks like a rotation which should be diagonalizable using complex numbers.(Look at examples where they don't commute like the matrix in misconception 1) Some rigor or intuition can be brought to these ideas with something called the polar decomposition. Overall this was very hand wavy, but hopefully helps give some vague inkling why this might be true. I highly highly recommend drawing and playing around with more example on paper by yourself. It is essential.

4) Why do the eigenvalues of orthogonal matrices have complex magnitude 1.

Suppose $A$ is an orthogonal matrix and $Av = \lambda v$. We compute $(Av)^*Av = v^*A^TAv = v^*v = |\lambda|^2v^*v \implies |\lambda|^2 = 1$. The symbol $v^*$ means take $v^T$ except with the complex conjugate of all the values, referred to as the conjugate transpose. This guarantees $v^*v \neq 0$. See that $v^Tv = 0$ for eigenvectors of the 90 degree counter clockwise turn.

5) When might the eigenvalues be real and not complex?

It turns out in the third special case listed in section 2 we have a guarantee that the eigenvalues are real. That is a matrix is symmetric its eigenvalues are real.