Proof of Spectral Theorem – Linear Algebra

linear algebraspectral-theory

There are a number of results called Spectral Theorems. This question deals with the Linear Algebra result on normal operators, which has the self-adjoint case as a particular case.

In class, we saw the spectral theorem for self-adjoint operators, and the teacher attempted a sketch of a proof which nobody in the class understood. Writing my thesis, I hit into the normal operator case, so I was looking for a complete proof of it.

How is it proven?

Best Answer

The main reason for posting this was to answer it, thus collecting all this stuff in a single place for future reference -- and present too.

The first item on this proof is that a linear operator on a finite-dimensional complex vector space admits an upper triangular representation. This is proved by induction on $n:=\dim V$, $V$ being the vector space. If it is 1D, the proof is trivial. Suppose $\dim V=n>1$ and the theorem holds for dimensions up to $n-1$. We know our operator $T$ has an eigenvalue. Indeed, consider $v,Tv,T^2v,T^3v,\dotsc,T^nv$. Those cannot be linearly independent if $v\neq0$, since they are $n+1$ and $\dim V=n$. So there exist $a_i\in\mathbb{C}$ such that: $$\sum_{i=1}^nT^iva_i=0.$$ Let $m$ be the largest index such that $a_m\neq0$. THis is not 0, since $v\neq0$. Factor the polynomial: $$a_0+a_1z+\dotso+a_mz^m=c(z-\lambda_1)\cdot\dotso\cdot(z-\lambda_m).$$ Substituting $T$ for $z$, and applying to $v$, we find: $$0=\left(\sum_{i=1}^ma_iT^i\right)v=c(T-\lambda_1I)\cdot\dotso\cdot(T-\lambda_mI)v,$$ so $T-\lambda_iI$ is not injective for some $i$. But this equates to $\lambda_i$ being an eigenvector, since not injective iff has nontrivial kernel iff $(T-\lambda_iI)v=0$ for some $v\neq0$ iff $\lambda_iv=Tv$ i.e. $\lambda_i$ is an eigenvector. So going back to our original $T$, consider any eigenvalue $\lambda$. $T-\lambda I$ is not injective, but by nullity+rank we have $T-\lambda I$ is not surjective. If $U=\mathrm{Im}(T-\lambda I)$ is the range of that operator, then $\dim U<\dim V$. Also, $U$ is invariant under $T$ since: $$Tu=(T-\lambda I)u+\lambda u,$$ and if $u\in U$ then both summands are in $U$. So $T|_U$ is an operator on $U$, and by induction there exists a basis of $U$ such that $T$ is represented by an upper triangular matrix w.r.t that basis. So if $k:=\dim U$ and that basis is $\{u_1,\dotsc,u_k$, then $Tu_j$ is in the span of $u_1,\dotsc,u_j$ for all $j\leq m$. Extend that basis to a basis of $V$ by adding extra vectors $v_1,\dotsc,v_{n-k}$. $Tv_i$ is in the span of $u_1,\dotsc,u_k$ for all $i\leq n-k$, thus in that of $u_1,\dotsc,u_k,v_1,\dotsc,v_i$. And this gives us upper triangularity of the matrix representing $T$ w.r.t. $u_1,\dotsc,u_k,v_1,\dotsc,v_{n-k}$, QED.

The rest of this answer is practically copied off this pdf. First of all, notice how $T$, a linear operator, is uniquely determined by the values of $\langle Tu,v\rangle$ for $u,v\in V$. That is because the inner product is positive definite, so if $S$ satisfies $\langle Tu,v\rangle=\langle Su,v\rangle$ for all $u,v\in V$, we first conclude $\langle(T-S)u,v\rangle=0$ for all $u,v\in V$, but fixing $u$ this means $(T-S)u=0$, and that holds for all $u$, hence $T-S=0$ or $T=S$. This makes it sensible to define an operator via: $$\langle Tu,v\rangle=\langle u,T^\ast v\rangle,$$ for all $u,v\in V$. $T^\ast$ is uniquely determined as seen above, and is called the adjoint of $T$ w.r.t this inner product. Elementary properties of the operation of taking the adjoint are that $(S+T)^\ast=S^\ast+T^\ast$, $(aS)^\ast=\bar aS^\ast$ for the complex case, the identity is self-adjoint (i.e. coincides with its adjoint), adjoining is an involution (i.e. $(T^\ast)^\ast=T$), $M(T^\ast)=M(T)^\ast$ in the complex case, denoting by $^\ast$ the conjugate transpose of a matrix, and $(ST)^\ast=T^\ast S^\ast$. The linked pdf also proves the eigenvalues of a self-adjoint operators are all real, but this is irrelevant here, so I will leave the proof to that pdf. We define normal operators as those for which $TT^\ast=T^\ast T$, i.e. those commuting with their adjoints. The polarization identity is another interesting result I leave to the pdf. One result we will use is that, when $\|v\|=\sqrt{\langle v,v\rangle}$, then $\|Tv\|=\|T^\ast v\|$ for any $v$ if $T$ is normal The proof is immediate: \begin{align*} T\text{ is normal}\iff{}&TT^\ast-T^\ast T=0\iff\langle(TT^\ast-T^\ast T)v,v\rangle=0\quad\forall v\in V\iff{} \\ {}\iff{}&\langle T^\ast Tv,v\rangle=\langle TT^\ast v,v\rangle\quad\forall v\in V\iff{} \\ {}\iff{}&\|T^\ast v\|^2=\langle T^\ast v,T^\ast v\rangle=\langle Tv,Tv\rangle=\|Tv\|^2. \end{align*}

As is subsequently proved, this implies that if $T$ is normal the kernel of $T,T^\ast$ coincide, the eigenvalues of $T^\ast$ and $T$ are mutually conjugate, and that distinct eigenvalues are associated to orthogonal eigenvectors, which is in fact true in general.

Now the big result: unitary diagonalizability equates to normality. This statement is of course equivalent to proving an operator $T$ is normal iff it admits an orthonormal eigenbasis, since any change of basis is unitary. So let us assume $T$ is normal. We know any operator can be represented by an upper triangular matrix w.r.t. some basis. We take that basis and show the corresponding matrix representation of $T$, $M(T)$, is in fact diagonal. This makes use of the Pythagorean theorem, proved here, and of the norm identity we proved a while ago relating the norm of an image via $T$ to that via $T^\ast$. By definition, if $M(T)=(a_{ij})_{i,1=1}^n$, we have $Te_i=a_{ii}e_i$,and since $M(T^\ast)=M(T)^\ast$ we also know $T^\ast e_i=\sum_i^n\bar a_{ik}e_k$. So by the Pythagorean theorem and the norm identity: $$|a_{ii}|^2=\|Te_i\|^2=\|T^\ast e_i\|^2=\sum_{k=i}^n\|a_{ik}|^2,$$ implying those for $k\neq i$ are all zero terms. The above holds for any $i$, proving $M(T)$ is diagonal.

Now suppose $M(T)$ is diagonalizable w.r.t. some orthonormal eigenbasis. $M(T^\ast)=M(T)^\ast$, so $T^\ast$ is also diagonalizable. Indeed, they are both diagonalizable w.r.t. the same basis, since the eigenvalues are mutually conjugate and the eigenvectors coincide. But we know $M(TT^\ast)=M(T)M(T^\ast)$, so: $$M(TT^\ast)=M(T)M(T^\ast)=M(T)M(T)^\ast=M(T)^\ast M(T)=M(T^\ast)M(T)=M(T^\ast T),$$ since diagonal matrices always commute. Thus, $T^\ast T=TT^\ast$, for if the matrix representations w.r.t. some basis coincide it means the two have the same images for any vector, and thus coincide. So if $T$ is diagonalizable, $T$ is normal.

Update

I just realised the proof implicitly uses the fact that if the quadratic form associated to an operator is zero then the operator is zero, i.e. $\langle Tv,v\rangle\,\,\forall v\in V\implies T=0$. This is proved here on p. 147:

$\quad$ (ii) Since $(T(x+y),x+y)=(Tx,x)+(Tx,y)+(Ty,x)+(Ty,y)$, $x,y\in V$, and $(Tv,v)=0$ for all $v\in V$, we have $$\tag{$*$} 0=(Tx,y)+(Ty,x).$$ If $V$ is an inner product space over $\mathbb{R}$, then from $(*)$: $$0=(Tx,y)+(Ty,x)=(Tx,y)+(y,Tx)=2(Tx,y).$$ Hence, $(Tx,y)=0$ for all $x,y\in V$, and $T\equiv 0$.

$\quad$ If $V$ is an inner product space over $\Bbb C$, then replacing $y$ by $iy$ in $(*)$, we have $(Tx,iy)+(iTy,x)=0$. Thus for all $x,y\in V$: $$(Tx,y)-(Ty,x)=0.$$ Hence, $(Tx,y)=0$ for all $x,y\in V$, and $T\equiv 0$.