Write down a square matrix, $A$. Now, raise it to the power 100. Not so easy, is it? Well, it is if the matrix is diagonal. It's also easy if the matrix is diagonalizable; if $P^{-1}AP=D$ is diagonal, then $A^{100}=PD^{100}P^{-1}$. So, computing high powers of matrices is made easy by diagonalization.
And why would you want to compute high powers of a matrix? Well, many things are modelled by discrete linear dynamical systems, which is a fancy way of saying you have a sequence of vectors $v_0,v_1,v_2,\dots$ where you get each vector (after the first) by multiplying the previous vector by $A$. But then $v_k=A^kv_0$, and voila! there's your high power of a matrix.
We are free to define what is meant by adjoint of an operator and adjoint of a matrix without any mention of a basis, orthonormal or otherwise. Indeed, we usually don't mention bases in either definition. Taking $\mathbb{F}$ to be either $\mathbb{R}$ or $\mathbb{C}$, the definitions may be stated as:
If $V$ and $W$ are finite-dimensional inner product spaces over $\mathbb{F}$, and $T:V\to W$ is linear, then the adjoint operator $T^{*}:W\to V$ is the unique operator with the property that$$\left<Tv,w\right>=\left<v,T^{*}w\right>$$ for all $v\in V$ and for all $w\in W$.
If $\mathbf{A}$ is a matrix with entries in $\mathbb{F}$, then the adjoint of $\mathbf{A}$ is$$\mathbf{A}^{*}=\overline{\mathbf{A}^{\top}}\mbox{.}$$
But when we define two meanings for the same word, we'd like the two meanings to be somehow related. In the case of the word adjoint, if the matrix $\mathbf{A}$ represents the operator $T$ with respect to bases $\alpha$ and $\beta$ of $V$ and $W$, respectively, then we'd like the adjoint of $\mathbf{A}$ to coincide with the matrix of the adjoint of $T$ with respect to $\beta$ and $\alpha$. That is, we want$$\left(\left[T\right]_{\alpha}^{\beta}\right)^{*}=\left[T^{*}\right]_{\beta}^{\alpha}\mbox{.}$$
The last equation is NOT true in general, but it is true when both $\alpha$ and $\beta$ are orthonormal. So that's where orthonormality becomes “necessary” in a sense. This is a result, however, not a definition. And even with the same definitions above, we can still write $\left[T^{*}\right]_{\beta}^{\alpha}$ in terms of $\left[T\right]_{\alpha}^{\beta}$ without assuming $\alpha$ and $\beta$ are orthonormal. Letting $\alpha=\left\{ \alpha_{1},\ldots,\alpha_{m}\right\}$ and $\beta=\left\{\beta_{1},\ldots,\beta_{n}\right\}$, the formula in general is$$\left[T^{*}\right]_{\beta}^{\alpha}=\mathbf{C}^{-1}\left(\left[T\right]_{\alpha}^{\beta}\right)^{*}\mathbf{B}$$
where$$\mathbf{C}=\left(\begin{array}{cccc}
\left<\alpha_{1},\alpha_{1}\right> & \left<\alpha_{2},\alpha_{1}\right> & \cdots & \left<\alpha_{m},\alpha_{1}\right>\\
\left<\alpha_{1},\alpha_{2}\right> & \left<\alpha_{2},\alpha_{2}\right> & \cdots & \left<\alpha_{m},\alpha_{2}\right>\\
\vdots & & & \vdots\\
\left<\alpha_{1},\alpha_{m}\right> & \left<\alpha_{2},\alpha_{m}\right> & \cdots & \left<\alpha_{m},\alpha_{m}\right>
\end{array}\right)$$
and$$\mathbf{B}=\left(\begin{array}{cccc}
\left<\beta_{1},\beta_{1}\right> & \left<\beta_{2},\beta_{1}\right> & \cdots & \left<\beta_{n},\beta_{1}\right>\\
\left<\beta_{1},\beta_{2}\right> & \left<\beta_{2},\beta_{2}\right> & \cdots & \left<\beta_{n},\beta_{2}\right>\\
\vdots & & & \vdots\\
\left<\beta_{1},\beta_{n}\right> & \left<\beta_{2},\beta_{n}\right> & \cdots & \left<\beta_{n},\beta_{n}\right>
\end{array}\right)\mbox{.}$$When orthonormal bases of $V$ and $W$ are not readily available, the formula above for $\left[T^{*}\right]_{\beta}^{\alpha}$ is usually more computationally efficient than applying Gram-Schmidt and change of basis matrices. However, the formula above assumes that inner products are linear in the first slot. If one prefers the definition of inner product which requires linearity in the second slot, then replace $\mathbf{C}$ and $\mathbf{B}$ above by their transposes.
Best Answer
Sorry, I'm a bit confused. Is the question whether all diagonalizable linear operators $T$ on a finite dimensional complex vector space $V$ are normal? (i.e. whether you have given a proof of this?)
If so, unfortunately the answer is no. Such a linear operator $T$ is normal if and only if $T$ is unitarily diagonalizable (i.e. is diagonalized by conjugation with a unitary matrix), and this is just not the case in general.
To disprove the claim it suffices to just give an example, e.g. the matrix: $$ \begin{bmatrix} 1&1\\ 0&2 \end{bmatrix} $$ is diagonalizable with eigenvalues $1$ and $2$ (check this), but is not normal.
You can try to apply your argument to this specific matrix to find out what's wrong. This issue is that $[T]_{\beta'}$ need not be a diagonal matrix, even though $[T]_\beta$ was. In my example matrix there is an eigenvector $ \begin{bmatrix} 1\\0 \end{bmatrix}$ with eigenvalue $1$ and an eigenvector $ \begin{bmatrix} 1\\1 \end{bmatrix}$ with eigenvalue $2$, but these vectors are not orthogonal. You might be relying on a "fact" that eigenvectors for distinct eigenvalues are orthogonal, but unfortunately this is not true in general (it holds for e.g. self-adjoint or normal matrices).
On the other hand, if we restrict $T$ to in addition be a self-adjoint linear operator then your argument does work, and you have given a valid proof that all self-adjoint linear operators are normal.