The comment you mention does not seem to be correct: in reality, over any field there exist commuting matrices which cannot be simultaneously Jordanized. Here is an example.
Let $n\geq 3$ and let $J_n$ be the $n$-th Jordan block, the $n\times n$ matrix whose entries are all $0$ except just above the diagonal where the $n-1$ entries equal 1.
I claim that although $J_n$ obviously commute with its square $J_n^2$, these matrices cannot be simultaneously Jordanized.
Indeed any matrix $P$ Jordanizing $J_n$ will satisfy $$P^{-1}J_nP=J_n$$ because of the uniqueness of Jordan forms.
But this will force (by squaring that equality) $$P^{-1}J_n^2P=J_n^2$$ which is not in Jordan form.
Hence no matrix $P$ can simultaneously Jordanize both $J_n$ and $J_n^2$.
This has undoubtedly been answered (likely multiple times) here before, so I post this at the risk of beating a dead (and decaying) horse.
Let me first link you to this page, which contains two excellent answers (I particularly recommend Keith Conrad's expository paper linked in Pierre-Yves Gaillard's answer). However, let me provide a perhaps more elementary viewpoint since, from experience, many people beginning this topic are not quite comfortable with minimal polynomial based arguments yet.
You seem to have covered part a quite adequately so let me focus on part b. I apologize in advance for the length, but I feel that this is a topic which requires thorough understanding.
The main thing to remember about commuting matrices is the fact that commuting matrices respect each other's eigenspaces. What does this mean? To talk about that, we first have to introduce the topic of an invariant subspace.
Consider a matrix mapping $A:\ V \rightarrow V$ for a vector space $V$. If there is some subspace $U$ of $V$ such that the restriction of $A$ to $U$ remains an operator in the sense that $A:\ U\rightarrow U$, then we say that $U$ is an invariant subspace of $A$. The term stable is also sometimes used. The significance of this is that $A(U) \subseteq U$, the image of $U$ is entirely contained within $U$. This way, it makes sense to talk about a restriction of the mapping to the smaller vector space $U$.
This is desirable for several reasons, the main one being that linear mappings on smaller vector spaces are easier to analyze. We can look at the action of the mapping on each invariant subspace and then piece them together to get an overall picture. This is what diagonalization does; we break down the vector space into smaller invariant subspaces, the eigenspaces, and then piece together the facts to get a simpler picture of how the mapping works. Many of the simpler, canonical representations are dependent on this fact (for example, the Jordan canonical form looks at the invariant generalized eigenspaces).
Now, if we have two commuting, diagonalizable matrices, then each eigenspace of $B$ is not only invariant under $B$ itself, but also under $A$. This is what we mean by preserving each other's eigenspaces. To see this, let $\mathbf{v}$ be an eigenvector of $B$ under eigenvalue $\lambda$. Then
$$B(A\mathbf{v}) = A(B\mathbf{v}) = \lambda A\mathbf{v}$$
so that $A\mathbf{v}$ is again an eigenvector of $B$ under eigenvalue $\lambda$. In our new language, this means that the eigenspace $E_\lambda$ of $B$ is invariant under $A$. This means it makes sense to look at the restriction of $A$ to $E_\lambda$.
Now consider the restriction of $A$ to $E_\lambda$. If all the eigenvalues of $B$ are simple (multiplicity one) then that means each eigenspace of $B$ is one dimensional. We have therefore restricted $A:\ E_\lambda \rightarrow E_\lambda$ to a mapping on a one-dimensional vector space. But this means that $A$ must take each vector of $E_\lambda$ to a scalar multiple of itself. You can check that this necessarily implies that $E_\lambda$ is also an eigenspace of $A$. Therefore, for any eigenbasis of $B$ that we take, the corresponding vectors also form an eigenbasis of $A$. This means that the two matrices are simultaneously diagonalizable; they share a common eigenbasis.
The general case is a bit more involved in that the restrictions to the invariant subspaces are more complex (they're no longer one-dimensional), but the ideas are identical.
P.S. Since you seem to be interested in physics, let me mention a crucial application of commuting operators. In quantum mechanics, you have quantities called observables, each of which is roughly speaking represented by a Hermitian matrix. Unlike in classical physics, different observables need not be simultaneously measurable (by measuring position for example, you cannot simultaneously measure momentum and vice versa) which is ultimately due to the fact that the position operator and the momentum operator do not commute (this is the underlying reasons behind the uncertainty principle). They do not have a shared basis which can represent the states of a system. Commuting operators therefore form a key element of quantum physics in that they define quantities which are compatible, i.e. simultaneously defined.
Best Answer
The effect of the (additive) commutator $T_A$ with a diagonal matrix$~A$ on an elementary matrix $E_{k,l}=(\delta_{i,k}\delta_{j,l})_{i,j=1}^n$ is scalar multiplication by $a_i-a_j$, where $a_i$ is the diagonal entry of$~A$ at position $i,i$. Therefore all such operators $T_A$ diagonalise on the basis $\{E_{k,l}\mid k=1,\ldots,n; l=1,\ldots,n\,\}$ of$~V$.