In the last month I studied the spectral theorems and I formally understood them. But I would like some intuition about them. If you didn’t know spectral theorems, how would you come up with the idea that symmetric/normal endomorphisms are the only orthogonally diagonalizable endomorphisms in the real/complex case. How would you even come up with the idea of studying the adjoint?
[Math] Intuition on spectral theorem
diagonalizationintuitionlinear algebraspectral-theory
Related Solutions
The answer to this question is yes.
If $A$ is real and orthogonally diagonalizable, then $A = UDU^T$ for some orthogonal matrix $U$ and real diagonal matrix $D$. We find that $$ A^T = (UDU^T)^T = UDU^T = A $$ so that $A$ is symmetric.
Similarly, if $A$ is complex and unitarily diagonalizable, then $A = UDU^*$ for some unitary matrix $U$ and (complex) diagonal matrix $D$. We find that $$ \begin{align} A^*A &= (UDU^*)^*(UDU^*) = UD^*(U^*U)DU^* = UD^*DU^* =U|D|^2 U^* \\ & = UDD^*U^* = UD(U^*U)D^*U^* = (UDU^*)(UDU^*)^* = AA^* \end{align} $$ so that $A$ is normal.
The main reason for posting this was to answer it, thus collecting all this stuff in a single place for future reference -- and present too.
The first item on this proof is that a linear operator on a finite-dimensional complex vector space admits an upper triangular representation. This is proved by induction on $n:=\dim V$, $V$ being the vector space. If it is 1D, the proof is trivial. Suppose $\dim V=n>1$ and the theorem holds for dimensions up to $n-1$. We know our operator $T$ has an eigenvalue. Indeed, consider $v,Tv,T^2v,T^3v,\dotsc,T^nv$. Those cannot be linearly independent if $v\neq0$, since they are $n+1$ and $\dim V=n$. So there exist $a_i\in\mathbb{C}$ such that: $$\sum_{i=1}^nT^iva_i=0.$$ Let $m$ be the largest index such that $a_m\neq0$. THis is not 0, since $v\neq0$. Factor the polynomial: $$a_0+a_1z+\dotso+a_mz^m=c(z-\lambda_1)\cdot\dotso\cdot(z-\lambda_m).$$ Substituting $T$ for $z$, and applying to $v$, we find: $$0=\left(\sum_{i=1}^ma_iT^i\right)v=c(T-\lambda_1I)\cdot\dotso\cdot(T-\lambda_mI)v,$$ so $T-\lambda_iI$ is not injective for some $i$. But this equates to $\lambda_i$ being an eigenvector, since not injective iff has nontrivial kernel iff $(T-\lambda_iI)v=0$ for some $v\neq0$ iff $\lambda_iv=Tv$ i.e. $\lambda_i$ is an eigenvector. So going back to our original $T$, consider any eigenvalue $\lambda$. $T-\lambda I$ is not injective, but by nullity+rank we have $T-\lambda I$ is not surjective. If $U=\mathrm{Im}(T-\lambda I)$ is the range of that operator, then $\dim U<\dim V$. Also, $U$ is invariant under $T$ since: $$Tu=(T-\lambda I)u+\lambda u,$$ and if $u\in U$ then both summands are in $U$. So $T|_U$ is an operator on $U$, and by induction there exists a basis of $U$ such that $T$ is represented by an upper triangular matrix w.r.t that basis. So if $k:=\dim U$ and that basis is $\{u_1,\dotsc,u_k$, then $Tu_j$ is in the span of $u_1,\dotsc,u_j$ for all $j\leq m$. Extend that basis to a basis of $V$ by adding extra vectors $v_1,\dotsc,v_{n-k}$. $Tv_i$ is in the span of $u_1,\dotsc,u_k$ for all $i\leq n-k$, thus in that of $u_1,\dotsc,u_k,v_1,\dotsc,v_i$. And this gives us upper triangularity of the matrix representing $T$ w.r.t. $u_1,\dotsc,u_k,v_1,\dotsc,v_{n-k}$, QED.
The rest of this answer is practically copied off this pdf. First of all, notice how $T$, a linear operator, is uniquely determined by the values of $\langle Tu,v\rangle$ for $u,v\in V$. That is because the inner product is positive definite, so if $S$ satisfies $\langle Tu,v\rangle=\langle Su,v\rangle$ for all $u,v\in V$, we first conclude $\langle(T-S)u,v\rangle=0$ for all $u,v\in V$, but fixing $u$ this means $(T-S)u=0$, and that holds for all $u$, hence $T-S=0$ or $T=S$. This makes it sensible to define an operator via: $$\langle Tu,v\rangle=\langle u,T^\ast v\rangle,$$ for all $u,v\in V$. $T^\ast$ is uniquely determined as seen above, and is called the adjoint of $T$ w.r.t this inner product. Elementary properties of the operation of taking the adjoint are that $(S+T)^\ast=S^\ast+T^\ast$, $(aS)^\ast=\bar aS^\ast$ for the complex case, the identity is self-adjoint (i.e. coincides with its adjoint), adjoining is an involution (i.e. $(T^\ast)^\ast=T$), $M(T^\ast)=M(T)^\ast$ in the complex case, denoting by $^\ast$ the conjugate transpose of a matrix, and $(ST)^\ast=T^\ast S^\ast$. The linked pdf also proves the eigenvalues of a self-adjoint operators are all real, but this is irrelevant here, so I will leave the proof to that pdf. We define normal operators as those for which $TT^\ast=T^\ast T$, i.e. those commuting with their adjoints. The polarization identity is another interesting result I leave to the pdf. One result we will use is that, when $\|v\|=\sqrt{\langle v,v\rangle}$, then $\|Tv\|=\|T^\ast v\|$ for any $v$ if $T$ is normal The proof is immediate: \begin{align*} T\text{ is normal}\iff{}&TT^\ast-T^\ast T=0\iff\langle(TT^\ast-T^\ast T)v,v\rangle=0\quad\forall v\in V\iff{} \\ {}\iff{}&\langle T^\ast Tv,v\rangle=\langle TT^\ast v,v\rangle\quad\forall v\in V\iff{} \\ {}\iff{}&\|T^\ast v\|^2=\langle T^\ast v,T^\ast v\rangle=\langle Tv,Tv\rangle=\|Tv\|^2. \end{align*}
As is subsequently proved, this implies that if $T$ is normal the kernel of $T,T^\ast$ coincide, the eigenvalues of $T^\ast$ and $T$ are mutually conjugate, and that distinct eigenvalues are associated to orthogonal eigenvectors, which is in fact true in general.
Now the big result: unitary diagonalizability equates to normality. This statement is of course equivalent to proving an operator $T$ is normal iff it admits an orthonormal eigenbasis, since any change of basis is unitary. So let us assume $T$ is normal. We know any operator can be represented by an upper triangular matrix w.r.t. some basis. We take that basis and show the corresponding matrix representation of $T$, $M(T)$, is in fact diagonal. This makes use of the Pythagorean theorem, proved here, and of the norm identity we proved a while ago relating the norm of an image via $T$ to that via $T^\ast$. By definition, if $M(T)=(a_{ij})_{i,1=1}^n$, we have $Te_i=a_{ii}e_i$,and since $M(T^\ast)=M(T)^\ast$ we also know $T^\ast e_i=\sum_i^n\bar a_{ik}e_k$. So by the Pythagorean theorem and the norm identity: $$|a_{ii}|^2=\|Te_i\|^2=\|T^\ast e_i\|^2=\sum_{k=i}^n\|a_{ik}|^2,$$ implying those for $k\neq i$ are all zero terms. The above holds for any $i$, proving $M(T)$ is diagonal.
Now suppose $M(T)$ is diagonalizable w.r.t. some orthonormal eigenbasis. $M(T^\ast)=M(T)^\ast$, so $T^\ast$ is also diagonalizable. Indeed, they are both diagonalizable w.r.t. the same basis, since the eigenvalues are mutually conjugate and the eigenvectors coincide. But we know $M(TT^\ast)=M(T)M(T^\ast)$, so: $$M(TT^\ast)=M(T)M(T^\ast)=M(T)M(T)^\ast=M(T)^\ast M(T)=M(T^\ast)M(T)=M(T^\ast T),$$ since diagonal matrices always commute. Thus, $T^\ast T=TT^\ast$, for if the matrix representations w.r.t. some basis coincide it means the two have the same images for any vector, and thus coincide. So if $T$ is diagonalizable, $T$ is normal.
Update
I just realised the proof implicitly uses the fact that if the quadratic form associated to an operator is zero then the operator is zero, i.e. $\langle Tv,v\rangle\,\,\forall v\in V\implies T=0$. This is proved here on p. 147:
$\quad$ (ii) Since $(T(x+y),x+y)=(Tx,x)+(Tx,y)+(Ty,x)+(Ty,y)$, $x,y\in V$, and $(Tv,v)=0$ for all $v\in V$, we have $$\tag{$*$} 0=(Tx,y)+(Ty,x).$$ If $V$ is an inner product space over $\mathbb{R}$, then from $(*)$: $$0=(Tx,y)+(Ty,x)=(Tx,y)+(y,Tx)=2(Tx,y).$$ Hence, $(Tx,y)=0$ for all $x,y\in V$, and $T\equiv 0$.
$\quad$ If $V$ is an inner product space over $\Bbb C$, then replacing $y$ by $iy$ in $(*)$, we have $(Tx,iy)+(iTy,x)=0$. Thus for all $x,y\in V$: $$(Tx,y)-(Ty,x)=0.$$ Hence, $(Tx,y)=0$ for all $x,y\in V$, and $T\equiv 0$.
Best Answer
Regarding the adjoint, suppose you have vectors spaces $X$ and $Y$ (over the same field), and a linear map $$ T:X\to Y $$ Write $X^*$ and $Y^*$ for the dual spaces. Then $T$ naturally induces a map $$ T^*:Y^* \to X^* $$ defined by $$ T^*(\phi):=\phi\circ T $$ This makes sense, because if $\phi$ is a linear functional on $Y$, then $\phi\circ T$ is a linear functional of $X$. Moreover, the function $T^*$ is also a linear transformation. This $T^*$ is called the adjoint of $T$ (there is a slight abuse of notation/terminology here, I'll elaborate on this in a moment). This is an example of what is called functorial behaviour. Taking adjoints is an example of what is called a contravariant functor.
Now, suppose that $X$ and $Y$ are finite-dimensional inner product spaces. Then you know that $X$ and $X^*$ can be canonically identified with each other. On the one hand, any $x\in X$ gives rise to a linear functional $\phi_x\in X^*$ defined by $$ \phi_x(v):=\langle v,x\rangle $$ Write $S_X:X\to X^*$ for the map that sends $x$ to $\phi_x$. It is easy to verify that $S_X$ is conjugate linear, i.e. $S_X(x+x')=S_X(x)+S_X(x')$ and $S_X(\alpha x)=\bar \alpha S_X(x)$.
On the other hand, given any $\phi\in X^*$, one can show that there exists (a unique) vector $x_\phi\in X$ such that, for every $v\in X$, $$ \phi(v)=\langle v, x_\phi\rangle $$ This shows that the function $S_X$ above is invertible, so it is "almost" an isomorphism, except for the fact that it is not strictly linear, but conjugate linear.
Now, the same thing can be done with $Y$, and we obtain a conjugate isomorphism $S_Y:Y\to Y^*$.
Consider now the composition $$ Y\overset{S_Y}{\longrightarrow} Y^*\overset{T^*}{\longrightarrow} X^* \overset{S^{-1}_X}{\longrightarrow} X $$ Call this composition $\hat T$, i.e. $\hat T(y)=(S^{-1}_X\circ T^*\circ S_Y)(y)$. You can check that $\hat T$ is linear.
Fix $x\in X$ and $y\in Y$. Put $\phi=(T^*\circ S_Y) y\in X^*$. Now, $S_X^ {-1}\phi$ is, by definition, the unique vector $z\in X$ such that $\langle v,z\rangle =\phi (v)$ for every $v\in X$. Therefore, $$ \langle x,\hat Ty\rangle =\langle x,S^{-1}_X\phi\rangle=\phi(x) $$ Now, $\phi=T^*(S_Yy)=(S_Yy)\circ T$. So, $$ \phi(x)=(S_Yy)(Tx) $$ Now, $S_Yy\in Y^*$ is the linear functional which right multiplies a vector in $Y$ by $y$. This means that $$ (S_Yy)(Tx)=\langle Tx,y\rangle $$ Putting everything together, we get that $$ \langle x,\hat Ty\rangle =\langle Tx,y\rangle $$ So, $\hat T$ has the property that "the adjoint" has in every linear algebra text. In practice, we use $T^*$ to refer to the above $\hat T$, and the original $T^*$ is left behind. I will be following this convention from now on, i.e. all $T^*$ in what follows really means $\hat T$. I should mention that having an inner product is key for all of this. For general vector spaces $X$ need not be isomorphic to $ X^*$.
Regarding your question about looking at normality, recall that, given a linear operators $T:X\to X$, a subspace $W\subset X$ is said to be $T$-invariant if $$ x\in W\implies Tx\in W $$ Define the orthogonal complement $$ W^\perp:=\{x\in X: \forall w\in W\langle x,y\rangle =0\} $$ Note that, if $W$ is $T$-invariant, then $W^\perp$ is $T^*$-invariant. Indeed, fix $x\in W^\perp$. We need to see that $T^*x\in W^\perp$. Let $w\in W$, then $$ \langle T^*x,w\rangle=\langle x,Tw\rangle=0 $$ because $x\in W^\perp$ and $Tw\in W$ (because $W$ is $T$-invariant). Since $w\in W$ was arbitrary, $T^*x\in W^\perp$.
If $T$ is, for example, self-adjoint, then we obviously have that a $W^\perp$ is $T$-invariant. This leads to the following question: can we find an easy property for an operator $T$ so that it satisfies that every $T$-invariant subspace has a $T$-invariant orthogonal complement? The answer to this question is yes, and the property is normality, see here.
How does this relate to being diagonalizable? Well, since the matrix of $T^*$ in the basis $B$ is the conjugate transpose of the matrix of $T$ in the basis $T$, it follows that any diagonalizable operator is necessarily normal.
Suppose now that $T$ is normal. Pick an eigenvalue $\lambda$ of $T$. Let $E$ be the associated eigenspace. Clearly, $E$ is $T$-invariant. Write $$ X=E\oplus E^\perp $$ By normality, $E^\perp$ is also $T$-invarint. This means that we can consider the restricted operator $T|_{E^\perp}:E^\perp \to E^\perp$. This new operator is also normal. But $\dim (E^\perp)<\dim X$, and we can carry out an inductive argument.