Edit: The statement is false. Here is a counterexample:
$$
A=\pmatrix{1&0\\ 1&2},\ B=\pmatrix{1&0\\ 0&2},\ AB-BA=\pmatrix{0&0\\ -1&0}=B-A\ne0.
$$
Both $A$ and $B$ here are diagonalisable. However, since $A$ and $B$ do not commute, they are not simultaneously diagonalisable.
However, under the assumption that $A,B$ and $AB-BA$ are diagonalisable, the statement is true. In this case, your $C$ is diagonalisable and it suffices to show that $C=0$. Without loss of generality, suppose $B$ is a diagonal matrix of the form $(\lambda_1I_{n_1})\oplus\cdots\oplus(\lambda_k I_{n_k})$, where $\lambda_1,\ldots,\lambda_k$ are distinct and they have increasing real parts. Then $CB−BC=C$ implies that with a conforming partition to $B$, $C$ is a block strictly upper triangular matrix and hence $C$ is nilpotent. Therefore $C$ is a diagonalisable nilpotent matrix and it must be zero.
As proven in this post, the idea goes as follows: take $W$ an $B$-invariant subspace. Now, since $B$ is diagonalizable with eigenvalues $\mu_1, \dots, \mu_k$,
$$
\mathbb{k}^n = E_{\mu_1} \oplus \cdots \oplus E_{\mu_k}
$$
It suffices to see that $W = (W\cap E_{\mu_1}) \oplus \cdots \oplus ( W\cap E_{\mu_k})$ in which case one can form a basis from basis of each $W \cap E_{\mu_i}$, which will be made of eigenvalues of $B$ because it is contained in $E_{\mu_i}$. In effect, let's see both inclusion: the immediate one is that $(W\cap E_{\mu_1}) \oplus \cdots \oplus ( W\cap E_{\mu_k})\subseteq W$ since each space is contained in $W$, and the latter is a subspace.
As for the other, since $W = W \cap \mathbb{k}^k = W \cap \bigoplus_{i=1}^n E_{\mu_i}$, any element $w$ of $W$ is a sum of eigenvectors,
$$w = e_1 + \dots + e_l$$
with $e_i$ eigenvector of eigenvalue $\mu_{j_i}$. Therefore, it is sufficient to show that if $\sum_{i=1}^ke_l \in W$, then $e_1, \dots, e_l \in W$. We proceed by induction on $l$. If $l = 1$, then $e_1 = w \in W$. If $l >1$, since
$$
Bw - \mu_{j_1}w = (\mu_{j_1} - \mu_{j_1})e_1 + \dots + (\mu_{j_l} - \mu_{j_1})e_l \in W
$$
and $\mu_{j_i} - \mu_{j_1} \neq 0$, by inductive hypothesis $e_i \in W$ for $i >1$, and so finally $e_1 = w - e_2 - \dots - e_l \in W$, completing the proof.
Best Answer
That follows from the book Simultaneous Triangularization by Radjavi and Rosenthal (page 8). The original proof is due to Thomas Laffey.
Let $\{y\}$ be a basis of $\mathrm{Im}(AB-BA)$. Let $\lambda\in\mathrm{Spec}(B)$. If $B=\lambda I$, then there is almost nothing to do. Otherwise $F=\ker(B-\lambda I)$, $G=\mathrm{Im}(B-\lambda I)$ are non-trivial $B$-invariant subspaces. If we show that $F$ or $G$ is $A$-invariant, then we are the kings of oil.
Assume that $F$ is not $A$-invariant. Then there is $x$ s.t. $(B-\lambda I)x=0$, $(B-\lambda I)Ax\not= 0$. We have $$A(B-\lambda I)x-(B-\lambda I)Ax=ABx-BAx=-(B-\lambda I)Ax\in\mathrm{Im}(AB-BA)\cap\mathrm{Im}(B-\lambda I)\setminus\{0\}.$$ Thus $y\in G$.
Let $z\in \mathbb{C}^n$. Then $A(B-\lambda I)z$ is in the form $(B-\lambda I)Az+\alpha y$. Therefore, $G$ is $A$-invariant and we are done. $\square$