You can apply the following powerful idea: think of Cayley-Hamilton as a statement about the "universal matrix," the one whose entries are indeterminates $x_{ij}$ living in a polynomial ring $\mathbb{Z}[x_{ij}]$. The statement is that $P(X) = 0$ where $P$ is a polynomial whose coefficients are polynomials in the $x_{ij}$, so this statement itself is, for $n \times n$ matrices, a collection of $n^2$ polynomial identities in $n^2$ variables over $\mathbb{Z}$, or equivalently a collection of $n^2$ polynomials that you would like to vanish. Now:
Claim: Let $f(y_1, \dots y_k)$ be a polynomial in any number of variables over $\mathbb{Z}$. The following are equivalent:
- $f$ is identically zero (in the sense that all of its coefficients are zero).
- $f(y_1, \dots y_k) = 0$ for $y_i$ every element of every commutative ring.
- $f(y_1, \dots y_k) = 0$ for $y_i$ every element of a fixed infinite field $K$ of characteristic zero.
Proof. The implications $1 \Rightarrow 2 \Rightarrow 3$ are immediate from the definitions, so it remains to prove $3 \Rightarrow 1$. This can be done by induction on $k$: for $k = 1$ this reduces to the observation that a nonzero polynomial has finitely many roots, and the inductive step proceeds by fixing some of the variables and varying the others. We can also appeal to the combinatorial Nullstellensatz. $\Box$
Now we can prove Cayley-Hamilton over every commutative ring by proving it over any infinite field of characteristic zero (note that we crucially needed to use the fact that the polynomials involved have integer coefficients to get this freedom). In particular we can work over an algebraically closed field, where the proof can be organized as follows:
- As you already observed, Cayley-Hamilton is easy to prove for diagonalizable matrices.
- Now your second observation, in geometric terms, says that the diagonalizable matrices are Zariski dense in all matrices, meaning any polynomial vanishing on the diagonalizable matrices must vanish identically. This is a consequence of the fact that matrices with distinct eigenvalues are Zariski open, because their complement is matrices such that the discriminant of the characteristic polynomial (which is a polynomial) vanishes, and in any irreducible variety (meaning the ring of polynomial functions is an integral domain - you use this property crucially) Zariski opens are Zariski dense (this is essentially what you prove).
Lots of other results about matrices can be proven this way. For example:
Exercise: Let $A, B$ be $n \times n$ matrices. Then $AB$ and $BA$ have the same characteristic polynomial.
(I tried to spoiler tag this but the syntax I found didn't work. Anyone know what's going on with that?)
Proof. The statement that $\det(tI - AB) = \det(tI - BA)$ is a collection of $n$ polynomial identities in $2n^2$ variables $a_{ij}, b_{ij}$ (the coefficients of the "universal pair of matrices"), or equivalently a single polynomial identity in $2n^2 + 1$ variables, so as above to prove this statement over every commutative ring it suffices to prove it over a fixed infinite field. The statement is clearly true if, say, $A$ is invertible, since then $AB$ and $BA$ are conjugate, and now we use the fact that invertible matrices are Zariski open (defined by the nonvanishing of the determinant), hence Zariski dense, in all matrices. (It's also possible to avoid use of the Zariski topology by working over $\mathbb{R}$ or $\mathbb{C}$ with the Euclidean topology and showing that invertible matrices are dense in the usual sense here.)
Here is a cleaner algebraic reformulation of the proof, working just over the universal ring $\mathbb{Z}[a_{ij}, b_{ij}]$. Observe that
$$\det(A) \det(tI - BA) = \det(tA - ABA) = \det(tI - AB) \det(A)$$
and now use the fact that $\mathbb{Z}[a_{ij}, b_{ij}]$ is an integral domain (so, geometrically, its spectrum is an irreducible affine scheme), so we can cancel $\det(A)$ from both sides, despite the fact that it is not true that the the determinant of a matrix always vanishes. $\Box$
Best Answer
The answer is yes. In the most general setting, extending the field of scalars is done via the (appropriately named) extension of scalars. Of particular note is the notion of complexification. Note also that a "complexified" real vector space is sometimes referred to as a linear complex structure.
Whatever your field $F$, it is not too difficult to establish that there exists a sensible notion of multiplication by the extended field $F'$. The "tricky" parts are usually related to how the original vector space "fits" within this extended structure. For example: the complex eigenvalues and eigenvectors of a real matrix come in complex-conjugate pairs; coming up with (and proving) an analogous statement for endomorphisms over a complexified vector space means that we need to say what exactly "having real entries" should mean in the abstracted context.