But here is what I don't understand. He says $P^TAP$ represents the linear transformation $T: x \mapsto Ax$. What does he mean here?
The author actually states that "$P^TAP$ represents the linear transformation $T: x \mapsto Ax$ in the basis $B$".
So it is a matter of understanding what does in the basis $B$ mean.
What does in the basis $B$ mean?
The standard basis (of $\mathbb{R}^n$) is the set $e := \{e_1, e_2, \dotsc, e_n\}$,
where $e_i$ is the vector whose $i$-th element is $1$, and all others are $0$, that is,
$e_1 := (1, 0, 0, \dotsc, 0)^T,
e_2 := (0, 1, 0, \dotsc, 0)^T, \dotsc,
e_n := (0, 0, 0, \dotsc, 1)^T$.
We usually identify a given vector
$x := (x_1, x_2, \dotsc, x_n)^T$
with the point
$x_e = x_1 e_1 + x_2 e_2 + \dotsb + x_n e_n$
of $\mathbb{R}^n$.
What we are actually doing is regarding $x$ as a vector in the standard basis.
Now what if we were considering $x$ in an arbitrary orthonormal basis of $\mathbb{R}^n$, say,
$B := \{v_1, v_2, \dotsc, v_n\}$?
In this case, $x$ would correspond to the point
$x_B = x_1 v_1 + x_2 v_2 + \dotsb + x_n v_n$
of $\mathbb{R}^n$. That is, each coordinate $x_i$ of $x$ tells us how many times
the corresponding component $v_i$ of the basis $B$ should be accounted for.
We could regard "vector $x$ in the basis $B$" as a shorthand for that;
and should assume that when we don't specify a basis,
we are working under standard basis.
What the author is saying is that the matrix $A' = P^TAP$ in basis $B$
is equivalent to the matrix $A$ in the standard basis. If $v$ is a vector
in standard basis, let $B(v)$ be its equivalent in the basis $B$. In this case,
what the author is saying is that $B(Av) = A'B(v)$. To see why this is true,
we have to see how to perform changes of bases, which I will refer to as translating.
Translating vectors between bases
If $P$ is the matrix whose columns are the vectors in $B$,
then $Px = x_1 v_1 + x_2 v_2 + \dotsb + x_n v_n$ (check this).
Note that this means that $v$ in the basis $B$ is $Pv$ in the basis $e$.
So the matrix $P$ translates a vector in the basis $B$ to the standard basis.
But what if we want to translate a vector $v$ from the standard basis to the basis $B$?
This is the same as asking how to express $v$ as a linear combination of vectors in $B$,
that is, finding a vector $v' := (a_1, a_2, \dotsc, a_n)^T$ such that
$v'_B = a_1 v_1 + a_2 v_2 + \dotsb + a_n v_n = v_e$.
Since $B$ is an orthonormal basis of $\mathbb{R}^n$,
it is simply a matter of considering the orthogonal projections of $v$ on each vector of $B$,
that is, we can take $a_i = v_i^T v$.
If we ponder for a while, we will see that $v' = P^Tv$ (check this too). In other words,
we can use the matrix $P^T$ to translate a vector in the standard basis to a vector in the basis $B$.
Translating transformations to other bases
Now $A$ is a transformation that receives vectors in standard basis and that gives results expressed in the standard basis as well.
If we want to find an equivalent operator that works in terms of a basis $B$,
we can do the following:
- Translate the given vector $v$ in the basis $B$ to the standard basis, that is, get the vector $Pv$;
- Perform the transformation $A$ on this translation, that is, compute $A(Pv)$;
- Translate the result back to basis $B$, that is, do $P^T(A(Pv))$.
From this, we can see that the matrix $P^TAP$ is the transformation $A$ working under the basis $B$.
The columns of a transformation
As for
Also, what he says after that doesn't really make sense to me, i.e. the first column $P^TAP$ is the coordinate vector $T(v_1)$ with respect to $B$
in the proof, the author is simply stating that the first column of $P^TAP$ is the vector $T(v_1)$ expressed in terms of the basis $B$.
"The translation of an eigenvector of a transformation" is
"an eigenvector of the translation of the transformation".
We are only renaming things, so the transform still does the same thing to
$\mathbb{R}^n$, the only difference is how we are addressing the given and
resulting points.
Since the eigenvector $v_1$ in the standard basis is the vector $e_1$ in the basis $B$,
this means that $e_1$ is an eigenvector of the translated transformation $A' = P^TAP$.
Since both $v_1$ and $e_1$ are unit vectors, they are both scaled by the same factor $\lambda$. So $A'e_1 = \lambda_1 e_1 = (\lambda_1, 0, \dotsc, 0)^T$.
Note that for an arbitrary transformation $M$,
$Me_i$ is simply the $i$-th column of $M$.
We can also see this as what the transformation does to each axis of our basis:
it takes the vector $e_i$ to the $i$-th column vector of $M$.
In the basis $B$, the transformation $A'$ takes $e_1$ to
$\lambda_1 e_1$, so the first column of $A'$ is simply $\lambda_1 e_1$.
Best Answer
Here is a proof avoiding the spectral Theorem.
Let $\lambda _1, \lambda _2, \ldots , \lambda _k$ be the distinct eigenvalues of $A$ and let $E_1, E_2, \cdots , E_k$ be the projections onto the corresponding eigenspaces.
By seeing things from the point of view of a (not necessarily orthonormal) basis of eigenvectors, it is very very easy to prove that (don't let the long expression scare you) $$ E_i = \frac{(A-\lambda _1)\ldots \widehat{(A-\lambda _i)}\ldots (A-\lambda _k)}{(\lambda _i-\lambda _1)\ldots \widehat{(\lambda _i-\lambda _i)}\ldots (\lambda _i-\lambda _k)}, $$ where the hat means omission.
From this it is clear that each $E_i$ is also normal.
Lemma. Any normal, idempotent, real matrix is symmetric.
Proof. Let $E$ be such a matrix. We first claim that $\text{Ker}(E)=\text{Ker}(E^TE)$. To see this, observe that the inclusion $\text{Ker}(E)\subseteq \text{Ker}(E^TE)$ is evident. On the other hand, if $x\in \text{Ker}(E^TE)$, then $$ \|Ex\|^2 = \langle Ex, Ex\rangle = \langle E^TEx, x\rangle =0, $$ so $x\in \text{Ker}(E)$.
We then have that $$ \text{Ker}(E)=\text{Ker}(E^TE) =\text{Ker}(EE^T) =\text{Ker}(E^T). $$
Recalling that the range $R(A^T)$, of the transpose of a matrix $A$, coincides with $\text{Ker}(A)^\perp$, we then have that $$ R(E^T) = \text{Ker}(E)^\perp = \text{Ker}(E^T)^\perp = R(E). $$ We then see that $E$ and $E^T$ are projections sharing range and kernel, so necessarily $E=E^T$. QED
Back to the question, we then have that $$ A=\sum_{i=1}^k \lambda _kP_k, $$ so we conclude that $A$ is symmetric.