If the ground field $\mathbb{F}$ is $\mathbb{R}$ or $\mathbb{C}$, the following gives an elementary proof. Clearly every commutator has zero trace, so it suffices to show that every real or complex matrix with zero trace is a commutator. First, every traceless matrix is $\mathbb{F}$-similar to a matrix with zero diagonal (we will prove this claim later). So, WLOG we may assume that $C$ has a zero diagonal. Take $A$ as an arbitrary diagonal matrix $\mathrm{diag}(a_1,\ldots,a_n)$ with real and distinct diagonal entries. The equation $C=AB-BA$ then boils down to $(a_i-a_j)b_{ij}=c_{ij}$, which is solvable as $b_{ij}=c_{ij}/(a_i-a_j)$. QED
We now prove our claim that any traceless matrix $C$ is $\mathbb{F}$-similar to a matrix with zero diagonal when $\mathbb{F}=\mathbb{R}$ or $\mathbb{C}$. In the real case, as $C$ is traceless, if it has some nonzero diagonal entries, some two of them must have different signs. WLOG assume that they are $c_{11}$ and $c_{22}$. Note that
$$
\begin{pmatrix}\cos\theta&-\sin\theta\\ \sin\theta&\cos\theta\end{pmatrix}
\begin{pmatrix}c_{11}&c_{12}\\c_{21}&c_{22}\end{pmatrix}
\begin{pmatrix}\cos\theta&\sin\theta\\-\sin\theta&\cos\theta\end{pmatrix}
=\begin{pmatrix}c_{11}\cos^2\theta-(c_{12}+c_{21})\sin\theta\cos\theta+c_{22}\sin^2\theta&\ \ast\\ \ast&\ \ast\end{pmatrix}.
$$
Since $c_{11}$ and $c_{22}$ have different signs, $c_{11}\cos^2\theta-(c_{12}+c_{21})\sin\theta\cos\theta+c_{22}\sin^2\theta=0$ is always solvable over $\mathbb{R}$. We now turn to the complex case. By unitary triangulation, we may assume that $C$ is upper triangular. As $C$ is traceless, if it has some nonzero diagonal entries, some two of them must be distinct. Again, assume that they are $c_{11}$ and $c_{22}$. Perform diagonalization on the leading 2-by-2 principal block of $C$, we may further reduce it to a diagonal 2-by-2 block. Now we have
$$
\frac{1}{1+z^2}\begin{pmatrix}1&-z\\ z&1\end{pmatrix}
\begin{pmatrix}c_{11}&0\\0&c_{22}\end{pmatrix}
\begin{pmatrix}1&z\\-z&1\end{pmatrix}
=\frac{1}{1+z^2}\begin{pmatrix}c_{11}+c_{22}z^2&\ \ast\\ \ast&\ \ast\end{pmatrix}.
$$
As $c_{11}$ and $c_{22}$ are nonzero and distinct, $c_{11}+c_{22}z^2=0$ is always solvable for some $z\in\mathbb{C}$ with $z^2\not=-1$. Therefore, in both the real and complex cases, the $(1,1)$-th entry of $C$ can be made zero via a certain similarity transform. Continue in this manner recursively for the trailing principal submatrices of $C$, we obtain a matrix with zero diagonal.
Afternote: The above proof has made use of many properties of matrices over $\mathbb{R}$ and $\mathbb{C}$, so I am not sure whether the idea of proof is applicable when the ground field is different. If not, clearly some people will find the proof dissatisfactory because it does not reveal the true reason why the set of commutators is a matrix subspace. This is reminiscent of the Cayley-Hamilton Theorem for real matrices, for which we can embed the ground field into $\mathbb{C}$ and prove the theorem easily for those unitarily diagonalizable matrices first, then use a continuity argument to finish the proof. Algebraists usually regard this proof as dissatisfactory, but those who mostly work on $\mathbb{R}$ and $\mathbb{C}$ may take a different view. At any rate, the following reference contains a relatively short proof (which fills two and a half pages) of the statement that every traceless matrix over a general field is a commutator:
A.A. Albert and Benjamin Muckenhoupt (1957), "On matrices of trace zero". Michigan Math. J., 4(1):1-3.
The equation $\lambda^TA= \mathbf{0}^T$ is a statement that a certain linear combination of the rows of $A$ is $0$. Let the rows of $A$ be $A_1,A_2,\dots,A_n,$ and suppose that $\lambda_1A_1+\dots+\lambda_nA_n=\mathbf{0}^T$ Then if $\lambda^T=(\lambda_1,\dots,\lambda_n)$ we have $\lambda^TA= \mathbf{0}^T.$
Under the hypothesis that $\lambda^TA= \mathbf{0}^T \Rightarrow \lambda^T\mathbf{b}=0,\ (A|\mathbf{b})$ will have the same linear dependencies as $A$.
Best Answer
In fact, this isn't about matrices per se, but about inverses in general, and perhaps more specifically about inverses of functions. The same argument works for any function that has a left and a right inverse (and for elements of a monoid or ring, though these can also be interpreted as "functions" via an appropriate setting).
If you really want to try to understand the proof in terms of "meaning", then you should not think of matrices as a bunch of columns or a bunch of numbers, but rather as functions, i.e., as linear transformations.
Say $A$ is an $m\times n$ matrix; then $A$ is "really" a linear transformation from $\mathbb{R}^n$ to $\mathbb{R}^m$: the columns of $A$ are the images of the standard basis vectors of $\mathbb{R}^n$ under the transformation. If $B$ is a right inverse of $A$, then $B$ is $n\times m$, and $AB$ acts like the identity transformation on $\mathbb{R}^m$. In particular, $AB$ has to be onto, so the rank of $AB$ is $m$; since the rank of $AB$ is at most the rank of $A$, then the rank of $A$ has to be $m$; since the rank of $A$ is at most $\max(m,n)$, then $m\leq n$. This tells us that $A$ is onto (full rank), and that it has at least as many columns as it has rows.
If $C$ is a left inverse of $A$, then $C$ must be an $n\times m$ matrix, and $CA$ acts like the identity on $\mathbb{R}^n$. Because $CA$ is one-to-one, then $A$ has to be one-to-one. In particular, it's nullspace is trivial. That means that it cannot have more columns than rows (that would require a nontrivial nullspace, by the Rank-Nullity Theorem); since it has at least as many columns as it has rows, $A$ has exactly the same number of columns as rows, so $m=n$.
Moreover, $A$ is now one-to-one and onto. So it is in fact bijective. So it is in fact invertible. Invertible matrices have unique inverses by definition, so $B$ and $C$ have to be equal: they have no choice in the matter. It isn't about the details of the "recipe", it's about the properties of functions: once you have a function that is one-to-one and onto, it has an inverse and the inverse is unique.
I honestly think that trying to puzzle out the details of the "recipe" is not insightful here: it is staring at the bark of a single tree instead of trying to appreciate the forest.
But if you must (and I really think you shouldn't), then you want to realize that $AC$ and $CA$ are talking in a different language: the columns of $A$ specify a basis, $\gamma$, and tell you how to express the elements of $\gamma$ in terms of the standard basis $\beta$; it provides a "translation" from $\gamma$ to $\beta$. That is, $A=[\mathrm{Id}]_{\gamma}^{\beta}$. The inverse, $C$, explains how to express the elements of the standard basis $\beta$ in terms of the vectors in $\gamma$, $C=[\mathrm{Id}]_{\beta}^{\gamma}$. $AC$ talks in the language of the standard basis $\beta$, $CA$ talks in the language of $\gamma$. Then it becomes clear why "the same recipe" (not really) should work. It's not really the same recipe, because in $CA$ you "hear" vectors in $\gamma$, translate them into $\beta$, and then translates them back in to $\gamma$. But in $AC$ you "hear" vectors in $\beta$, translate them into $\gamma$, and then back into $\beta$. The "translation recipes" are the same, whether you do $\beta\to\gamma$ first or you do it second (translating English to Russian is the same, whether you are translating something written originally in English into Russian, or something that was translated into English first).
$A$ establishes a bijection between the vectors expressed in terms of $\gamma$, and the vectors expressed in terms of $\beta$. $C$ is the bijection going "the other way". Both $AC$ and $CA$ are the identity, but they are the identity of slightly different structures: $AC$ is the identity of "$\mathbb{R}^n$-with-basis-$\beta$", and $CA$ is the identity of "$\mathbb{R}^n$-with-basis-$\gamma$." Only when you forget that $AC$ and $CA$ are being interpreted as matrices on vector-spaces-with-basis do you realize that in fact you have the same "formula" for the expressions (i.e., that both matrices are "the" identity matrix).
If you want to stare at the bark so intently that you must think of matrices in terms of "linear combinations of rows" or "linear combinations of columns", you are going to miss a lot of important properties for matrices. Matrices are really functions; multiplication of matrices is really composition of functions. The aren't a bunch of vectors or a bunch of numbers thrown into a box that you multiply based on some ad hoc rule. They are functions.
Compare how easy, and, yes, intuitive, it is to realize that matrix multiplication is associative because it is just "composition of functions", vs. figuring it out by expanding the double summations of an entry-by-entry expression of $(AB)C$ and $A(BC)$: you are going in the wrong direction for 'intuitive' understanding. Staring at those summations will not tell you why multiplication is associative, it will only tell you it is. Trying to puzzle out why a right inverse is also a left inverse in terms of "adding columns" and "adding rows" is not going to help either: you need to think of it in terms of functions.