Write down a square matrix, $A$. Now, raise it to the power 100. Not so easy, is it? Well, it is if the matrix is diagonal. It's also easy if the matrix is diagonalizable; if $P^{-1}AP=D$ is diagonal, then $A^{100}=PD^{100}P^{-1}$. So, computing high powers of matrices is made easy by diagonalization.
And why would you want to compute high powers of a matrix? Well, many things are modelled by discrete linear dynamical systems, which is a fancy way of saying you have a sequence of vectors $v_0,v_1,v_2,\dots$ where you get each vector (after the first) by multiplying the previous vector by $A$. But then $v_k=A^kv_0$, and voila! there's your high power of a matrix.
The proof is by induction on the dimension of $V$; as with all proofs by induction, that means that we need to explicitly show that the statement is true for some base case(s) (in this case, when the dimension of $V$ is $1$), and that if the statement is true up to some dimension $n$ then it remains true in dimension $n+1$.
The approach is to take a vector space $V$ of dimension $n+1$ and breaking it up into two pieces, namely the subspace $U$ spanned by an eigenvector of $T$ and the subspace $U^\perp$ that is orthogonal to $U$. If $\alpha$ is an orthonormal basis of $U$ and $\beta$ is an orthonormal basis for $U^\perp$, then $\alpha \cup \beta$ is an orthonormal basis for $V$, so all we need to do is find $\alpha$ and $\beta$, each consisting of eigenvectors of $T$.
Finding $\alpha$ is easy, because $U$ is 1-dimensional and spanned by an eigenvector of $T$; just take any vector in $U$ and scale it to have norm $1$.
To find $\beta$ we'd like to apply the induction hypothesis. We do have $\dim(U^\perp) = n < \dim(V) = n+1$, which is good: If we have a self-adjoint operator from $U^\perp$ to $U^\perp$ then the induction hypothesis will give us the basis for $U^\perp$ that we're looking for. The operator we'd like to use is $T$, but $T$ is an operator from $V$ to $V$, not from $U^\perp$ to $U^\perp$. It would be nice, though, if we could think of $T$ as an operator from $U^\perp$ to $U^\perp$. For that reason we define $S : U^\perp \to U^\perp$ to do the same thing as $T$: for all $v \in U^\perp$, $S(v) = T(v)$. There's a little checking to do to make sure this makes sense (specifically, that if $v \in U^\perp$ then $T(v) \in U^\perp$ too), and that $S$ is self-adjoint.
Once those steps are done, we've now got a space ($U^\perp$) of dimension strictly less than the dimension of $V$, and a self-adjoint operator on that space. By induction hypothesis there is an orthonormal basis, call it $\beta$, for $U^\perp$ consisting of eigenvectors of $S$. But $S$ does the same thing as $T$, so the vectors in $\beta$ are also eigenvectors of $T$, which is what we wanted.
Best Answer
Let me outline how to prove that if $T$ is normal then you can find an orthonormal basis of eigenvectors for $T$.
For the other direction, assume that $v_1,\dots,v_n$ is an orthonormal basis of $U$ consisting of eigenvectors of $T$ and write $Tv_i = \lambda_i v_i$. Show using the defining property of $T^{*}$ that $T^{*}v_i = \overline{\lambda_i} v_i$ and then $$ (T^{*}T)(v_i) = |\lambda_i|^2 = (TT^{*})(v_i) $$ for all $1 \leq i \leq n$ showing that $TT^{*} = T^{*}T$ so $T$ is normal.