[Math] Understanding part of the proof of Spectral Theorem for symmetric matrices

diagonalizationlinear algebraorthogonality

I'm reading a textbook where the Spectral Theorem for symmetric matrices is proven. I understand almost everything about the proof except for one thing. The theorem is stated as follows:

Theorem: Let $A \in \mathbb{R}^{n \times n}$. Then $A$ is orthogonally diagonalizable if and only if $A$ is symmetric.

The first implication is easy. The converse is proven by induction by the author. Here is part of the proof:

We want to prove that for any symmetric matrix $A$, there is an
orthogonal matrix $P$ and a diagonal matrix $D$ such that $P^T AP =
D$. We prove this by induction. Any $1 \times 1$ symmetric matrix is
already diagonal, so we can take $P = I$ and the basic step is proven.

Now assume the theorem holds for $(n -1) \times (n-1)$ symmetric
matrices, with $n \geq 2$. Then we now prove it also holds for $n$. So
let $A$ be an $ n \times n$ symmetric matrix. We know that $A$ has
only real eigenvalues (he concludes this on the basis of a preceding
theorem). Let $\lambda_1$ be any eigenvalue of $A$, and let $v_1$ be
the corresponding eigenvector which satisfies $||v_1 || = 1 $. Then we
can extend the set $\left\{v_1 \right\}$ to a basis $\left\{ v_1, x_1,
x_2, \ldots, x_n \right\}$ of $\mathbb{R}^n$. We can then use the
Gram-Schmidt process to transform into an orthonormal basis $B =
\left\{v_1, v_2, \ldots, v_n \right\}$ of $\mathbb{R}^n$.

Let $P$ be the matrix whose columns are the vectors in $B$, with the
first column being $v_1$. Then $P$ is orthogonal because its column
vectors are all orthonormal. Now $P^T A P = P^{-1} AP$ represents the
linear transformation $T: x \mapsto Ax $ in the basis $B$. But we know
that the first column of $P^T AP$ will be the coordinate vector of
$T(v_1)$ with respect to the basis $B$. Now, $T(v_1) = Av_1 =
\lambda_1 v_1$, so this coordinate vector is \begin{align*}
\begin{pmatrix} \lambda_1 \\ 0 \\ \vdots \\ 0 \end{pmatrix}.
\end{align*} It follows that…

He then shows $P^T A P$ is diagonal by making use of induction hypothesis on a smaller block matrix.

But here is what I don't understand. He says $P^T A P$ represents the linear transformation $T: x \mapsto Ax$. What does he mean here? Does he mean the linear transformation $L_A : \mathbb{R}^n \rightarrow \mathbb{R}^n$ ? This doesn't seem right to me, since the matrixrepresentation of $L_A$ is just $A$. Also, what he says after that doesn't really make sense to me, i.e. that the first column $P^T A P$ is the coordinate vector $T(v_1)$ with respect to $B$. Maybe someone can clarify this, or provide an example?

Best Answer

But here is what I don't understand. He says $P^TAP$ represents the linear transformation $T: x \mapsto Ax$. What does he mean here?

The author actually states that "$P^TAP$ represents the linear transformation $T: x \mapsto Ax$ in the basis $B$". So it is a matter of understanding what does in the basis $B$ mean.

What does in the basis $B$ mean?

The standard basis (of $\mathbb{R}^n$) is the set $e := \{e_1, e_2, \dotsc, e_n\}$, where $e_i$ is the vector whose $i$-th element is $1$, and all others are $0$, that is,

$e_1 := (1, 0, 0, \dotsc, 0)^T, e_2 := (0, 1, 0, \dotsc, 0)^T, \dotsc, e_n := (0, 0, 0, \dotsc, 1)^T$.

We usually identify a given vector $x := (x_1, x_2, \dotsc, x_n)^T$ with the point $x_e = x_1 e_1 + x_2 e_2 + \dotsb + x_n e_n$ of $\mathbb{R}^n$. What we are actually doing is regarding $x$ as a vector in the standard basis.

Now what if we were considering $x$ in an arbitrary orthonormal basis of $\mathbb{R}^n$, say, $B := \{v_1, v_2, \dotsc, v_n\}$?

In this case, $x$ would correspond to the point $x_B = x_1 v_1 + x_2 v_2 + \dotsb + x_n v_n$ of $\mathbb{R}^n$. That is, each coordinate $x_i$ of $x$ tells us how many times the corresponding component $v_i$ of the basis $B$ should be accounted for. We could regard "vector $x$ in the basis $B$" as a shorthand for that; and should assume that when we don't specify a basis, we are working under standard basis.

What the author is saying is that the matrix $A' = P^TAP$ in basis $B$ is equivalent to the matrix $A$ in the standard basis. If $v$ is a vector in standard basis, let $B(v)$ be its equivalent in the basis $B$. In this case, what the author is saying is that $B(Av) = A'B(v)$. To see why this is true, we have to see how to perform changes of bases, which I will refer to as translating.

Translating vectors between bases

If $P$ is the matrix whose columns are the vectors in $B$, then $Px = x_1 v_1 + x_2 v_2 + \dotsb + x_n v_n$ (check this). Note that this means that $v$ in the basis $B$ is $Pv$ in the basis $e$. So the matrix $P$ translates a vector in the basis $B$ to the standard basis.

But what if we want to translate a vector $v$ from the standard basis to the basis $B$? This is the same as asking how to express $v$ as a linear combination of vectors in $B$, that is, finding a vector $v' := (a_1, a_2, \dotsc, a_n)^T$ such that $v'_B = a_1 v_1 + a_2 v_2 + \dotsb + a_n v_n = v_e$. Since $B$ is an orthonormal basis of $\mathbb{R}^n$, it is simply a matter of considering the orthogonal projections of $v$ on each vector of $B$, that is, we can take $a_i = v_i^T v$.

If we ponder for a while, we will see that $v' = P^Tv$ (check this too). In other words, we can use the matrix $P^T$ to translate a vector in the standard basis to a vector in the basis $B$.

Translating transformations to other bases

Now $A$ is a transformation that receives vectors in standard basis and that gives results expressed in the standard basis as well. If we want to find an equivalent operator that works in terms of a basis $B$, we can do the following:

  1. Translate the given vector $v$ in the basis $B$ to the standard basis, that is, get the vector $Pv$;
  2. Perform the transformation $A$ on this translation, that is, compute $A(Pv)$;
  3. Translate the result back to basis $B$, that is, do $P^T(A(Pv))$.

From this, we can see that the matrix $P^TAP$ is the transformation $A$ working under the basis $B$.

The columns of a transformation

As for

Also, what he says after that doesn't really make sense to me, i.e. the first column $P^TAP$ is the coordinate vector $T(v_1)$ with respect to $B$

in the proof, the author is simply stating that the first column of $P^TAP$ is the vector $T(v_1)$ expressed in terms of the basis $B$.

"The translation of an eigenvector of a transformation" is "an eigenvector of the translation of the transformation". We are only renaming things, so the transform still does the same thing to $\mathbb{R}^n$, the only difference is how we are addressing the given and resulting points.

Since the eigenvector $v_1$ in the standard basis is the vector $e_1$ in the basis $B$, this means that $e_1$ is an eigenvector of the translated transformation $A' = P^TAP$. Since both $v_1$ and $e_1$ are unit vectors, they are both scaled by the same factor $\lambda$. So $A'e_1 = \lambda_1 e_1 = (\lambda_1, 0, \dotsc, 0)^T$.

Note that for an arbitrary transformation $M$, $Me_i$ is simply the $i$-th column of $M$. We can also see this as what the transformation does to each axis of our basis: it takes the vector $e_i$ to the $i$-th column vector of $M$. In the basis $B$, the transformation $A'$ takes $e_1$ to $\lambda_1 e_1$, so the first column of $A'$ is simply $\lambda_1 e_1$.

Related Question