Let us start with another basis-independent yet more tractable (as it does not require the characteristic polynomial to split) definition of the trace. We will check in the end that it coincides with your definition, and with the sum of the diagonal coefficients with respect to any basis.
Let $V$ be an $n$-dimensional vector space over a field $F$. And let $L(V)$ be the algebra of $F$-linear maps from $V$ to $V$.
Note that we have a canonical isomorphism
$$
L(V)\simeq V\otimes V^*
$$
via $v\otimes w^* \simeq w^*(\cdot)v$. In other words, $L(V)$ is a natural incarnation of the tensor product of $V$ with its dual $V^*$, with rank-one operators as elementary tensors.
Observe that the bilinear map $(v,w^*)\longmapsto w^*(v)$ factors uniquely through the tensor product.
That's the trace, which is therefore characterized by
$$
\mathrm{tr}:V\otimes V^*\longrightarrow F\qquad \mathrm{tr}(v\otimes w^*)=w^*(v).
$$
Now choose any basis $\{e_i\}$ for $V$ and denote its dual basis by $\{e_i^*\}$. We have $\mathrm{tr}(e_i\otimes e_j^*)=\delta_{ij}$. Therefore, for every $x=\sum x_{ij}e_i\otimes e_j^*\in L(V)$, we have
$$
\mathrm{tr} (x)=\sum_{i=1}^n x_{ii}.
$$
Conclusion When given a matrix $x$ in $M_n(F)$, think of it as an operator in $L(F^n)$ via the canonical basis of $F^n$. Its trace is then defined canonically as above. And whatever basis you choose for $F^n$, the sum of the diagonal coefficients will be equal to $\mathrm{tr}(x)$. In particular, it is also equal to the sum of the eigenvalues counted with multiplicities when the characteristic polynomial of $x$ splits.
Note It also helps understand why $\mathrm{tr} (ab)=\mathrm{tr}(ba)$, beyond the calculation you mentioned. Indeed
$$
\mathrm{tr}((v_1\otimes w_1^*)(v_2\otimes w_2^*))=w_1^*(v_2)\mathrm{tr}(v_1\otimes w_2^*)=w_1^*(v_2)w_2^*(v_1)
$$
$$
=w_2^*(v_1)w_1^*(v_2)=w_2^*(v_1)\mathrm{tr}(v_2\otimes w_1^*)=\mathrm{tr}((v_2\otimes w_2^*)(v_1\otimes w_1^*))
$$
This answer assumes the matrices are taken over $\mathbb C$.
Yes, the statement is still true even if the matrix isn't diagonalizable.
For the proof you saw it is sufficient that $D$ can be taken to be an upper triangular matrix (and it can be taken in such a way, this is Schur's Decomposition Theorem). This is enough because its diagonal entries will be the eigenvalues of the starting matrix.
Jordan Canonical Form is also sufficient, but Schur's Decomposition is a weaker condition.
For completeness I'll add the proofs here.
Let $n\in \mathbb N$ and $A\in \mathcal M_n(\mathbb C)$. Let $\lambda _1, \ldots ,\lambda _n$ be the eigenvalues of $A$. The characteristic polynomial $p_A(z)$ of $A$ is $\color{grey}{p_A(z)=}(z-\lambda _1)\ldots (z-\lambda _n)$.
Schur's Decomposition guarantees the existence of an invertible matrix $P$ and an upper triangular matrix $U$ such that $A=PUP^{-1}$ and $U$'s diagonal entries are exactly $\lambda _1, \ldots ,\lambda _n$.
Since similarity preserves the characteristic polynomial, it follows that the characteristic polynomial $p_U(z)$ of $U$ is $\color{grey}{p_U(z)=}(z-\lambda _1)\ldots (z-\lambda _n)$, therefore $U$ and $A$ have the same eigenvalues with the same algebraic multiplicity.
From the fact that $U$'s diagonal entries are $\lambda _1, \ldots ,\lambda _n$ it follows that the trace of $U$ is the sum of the eigenvalues of $A$ and the determinant of $U$ is the product of the eigenvalues of $A$.
Trace properties yield the following $$\text{tr}(A)=\text{tr}\left(PUP^{-1}\right)=\text{tr}\left(UP^{-1}P\right)=\text{tr}(U),$$ thus proving that the sum of the eigenvalues of $A$ equals $\text{tr}(A)$.
Similarly for the determinant it holds that $$\det(A)=\det\left(PUP^{-1}\right)=\det\left(P\right)\det\left(U\right)\det\left(P^{-1}\right)=\det(U),$$
hence the product of teh eigenvalues of $A$ equals the determinant of $A$.
Best Answer
Hints:
If $A$ is normal, it is orthogonally diagonalizable, that is, there exists unitary $Q$ ($Q^*Q = QQ^* = I$) so that
$$ D = Q^*AQ $$
is a diagonal matrix, with the eigenvalues of $A$ along the diagonal.
Then, trace properties (like $\textrm{tr}(A^TB) = \textrm{tr}(AB^T)$ will finish the job.