No, the converse of Cayley-Hamilton is not true for $n\times n$ matrices with $n\gt 1$; in particular, it fails for $2\times 2$ matrices.
For a simple counterexample, notice that if $p(A)=0$, then for every multiple $q(x)$ of $p(x)$ you also have $q(A)=0$; so you would want to amend the converse to say "if $p(A)=0$, then $p(a)$ is a multiple of the characteristic polynomial of $A$". But even that amended version is false
However, the only failure in the $2\times 2$ matrix case are the scalar multiples of the identity. If $A=\lambda I$, then $p(x)=x-\lambda$ satisfies $p(A)=0$, but the characteristic polynomial is $(x-\lambda)^2$, not $p(x)$.
For bigger matrices, there are other situations where even this weakened converse fails.
The concept that captures the "converse" of Cayley-Hamilton is the minimal polynommial of the matrix, which is the monic polynomial $p(x)$ of smallest degree such that $p(A)=0$. It is then easy to show (using the division algorithm) that if $q(x)$ is any polynomial for which $q(A)=0$, then $p(x)|q(x)$. (Be careful to justify that if $m(x)=r(x)s(x)$, then $m(A)=r(A)s(A)$; this is not immediate because matrix multiplication is not in general commutative!) So we have:
Theorem. Let $A$ be an $n\times n$ matrix over $\mathbf{F}$, and let $\mu(x)$ be the minimal polynomial of $A$. If $p(x)\in \mathbf{F}[x]$ is any polynomial such that $p(A)=0$, then $\mu(x)$ divides $p(x)$.
The Cayley-Hamilton Theorem shows that the characteristic polynomial is always a multiple of the minimal polynomial. In fact, one can prove that every irreducible factor of the characteristic polynomial must divide the minimal polynomial. Thus, for a $2\times 2$ matrix, if the characteristic polynomial splits and has distinct roots, then the characteristic and minimal polynomial are equal. If the characteristic polynomial is irreducible quadratic and we are working over $\mathbb{R}$, then again the minimal and characteristic polynomials are equal. But if the characteristic polynomial is of the form $(x-a)^2$, then the minimal polynomial is either $(x-a)$ (when the matrix equals $aI$), or $(x-a)^2$ (when the matrix is not diagonalizable).
As for solving this problem: if $\lambda$ is an eigenvalue of $A$, and $A$ is invertible, then $\lambda\neq 0$, and $\frac{1}{\lambda}$ is an eigenvalue of $A^{-1}$: for if $\mathbf{x}\neq\mathbf{0}$ is such that $A\mathbf{x}=\lambda\mathbf{x}$, then multiplying both sides by $A^{-1}$ we get $\mathbf{x} = A^{-1}(\lambda \mathbf{x}) = \lambda A^{-1}\mathbf{x}$. Dividing through by $\lambda$ shows $\mathbf{x}$ is an eigenvector of $A^{-1}$ corresponding to $\frac{1}{\lambda}$.
Since $A=A^{-1}$, that means that if $\lambda_1,\lambda_2$ are the eigenvalues of $A$, then $\lambda_1 = \frac{1}{\lambda_1}$ and $\lambda_2=\frac{1}{\lambda_1}$; thus, each eigenvalue is either $1$ or $-1$.
If the matrix is diagonalizable, then we cannot have both equal to $1$ (since then $A=I$), and they cannot both be equal to $-1$ (since $A\neq -I$), so one eigenvalue is $1$ and the other is $-1$. Since the trace of a square matrix equals the sum of its eigenvalues, the sum of the eigenvalues is $0$.
Why is $A$ diagonalizable? If it has two distinct eigenvalues, $1$ and $-1$, then there is nothing to do; we know it is diagonalizable. If it has a repeated eigenvalue, say $1$, but $A-I$ is not the zero matrix, pick $\mathbf{x}\in \mathbb{R}^2$ such that $A\mathbf{x}\neq \mathbf{x}$; then $$\mathbf{0}=(A-I)^2\mathbf{x} = (A^2-2A + I)\mathbf{x} = (2I-2A)\mathbf{x}$$
by the Cayley Hamilton Theorem. But that means that $2(A-I)\mathbf{x}=\mathbf{0}$, contradicting our choice of $\mathbf{x}$. Thus, $A-I=0$, so $A=I$ and $A$ is diagonalizable. A similar argument shows that if $-1$ is the only eigenvalue, then $A+I=0$.
.
(Hiding behind that paragraph is the fact that if the minimal polynomial is squarefree and splits, then the matrix is diagonalizable; since $p(x)=x^2-1=(x-1)(x+1)$ is a multiple of the minimal polynomial, the matrix must be diagonalizable).
So this completes the proof that the trace must be $0$, given that $A\neq I$ and $A\neq -I$.
You can apply the following powerful idea: think of Cayley-Hamilton as a statement about the "universal matrix," the one whose entries are indeterminates $x_{ij}$ living in a polynomial ring $\mathbb{Z}[x_{ij}]$. The statement is that $P(X) = 0$ where $P$ is a polynomial whose coefficients are polynomials in the $x_{ij}$, so this statement itself is, for $n \times n$ matrices, a collection of $n^2$ polynomial identities in $n^2$ variables over $\mathbb{Z}$, or equivalently a collection of $n^2$ polynomials that you would like to vanish. Now:
Claim: Let $f(y_1, \dots y_k)$ be a polynomial in any number of variables over $\mathbb{Z}$. The following are equivalent:
- $f$ is identically zero (in the sense that all of its coefficients are zero).
- $f(y_1, \dots y_k) = 0$ for $y_i$ every element of every commutative ring.
- $f(y_1, \dots y_k) = 0$ for $y_i$ every element of a fixed infinite field $K$ of characteristic zero.
Proof. The implications $1 \Rightarrow 2 \Rightarrow 3$ are immediate from the definitions, so it remains to prove $3 \Rightarrow 1$. This can be done by induction on $k$: for $k = 1$ this reduces to the observation that a nonzero polynomial has finitely many roots, and the inductive step proceeds by fixing some of the variables and varying the others. We can also appeal to the combinatorial Nullstellensatz. $\Box$
Now we can prove Cayley-Hamilton over every commutative ring by proving it over any infinite field of characteristic zero (note that we crucially needed to use the fact that the polynomials involved have integer coefficients to get this freedom). In particular we can work over an algebraically closed field, where the proof can be organized as follows:
- As you already observed, Cayley-Hamilton is easy to prove for diagonalizable matrices.
- Now your second observation, in geometric terms, says that the diagonalizable matrices are Zariski dense in all matrices, meaning any polynomial vanishing on the diagonalizable matrices must vanish identically. This is a consequence of the fact that matrices with distinct eigenvalues are Zariski open, because their complement is matrices such that the discriminant of the characteristic polynomial (which is a polynomial) vanishes, and in any irreducible variety (meaning the ring of polynomial functions is an integral domain - you use this property crucially) Zariski opens are Zariski dense (this is essentially what you prove).
Lots of other results about matrices can be proven this way. For example:
Exercise: Let $A, B$ be $n \times n$ matrices. Then $AB$ and $BA$ have the same characteristic polynomial.
(I tried to spoiler tag this but the syntax I found didn't work. Anyone know what's going on with that?)
Proof. The statement that $\det(tI - AB) = \det(tI - BA)$ is a collection of $n$ polynomial identities in $2n^2$ variables $a_{ij}, b_{ij}$ (the coefficients of the "universal pair of matrices"), or equivalently a single polynomial identity in $2n^2 + 1$ variables, so as above to prove this statement over every commutative ring it suffices to prove it over a fixed infinite field. The statement is clearly true if, say, $A$ is invertible, since then $AB$ and $BA$ are conjugate, and now we use the fact that invertible matrices are Zariski open (defined by the nonvanishing of the determinant), hence Zariski dense, in all matrices. (It's also possible to avoid use of the Zariski topology by working over $\mathbb{R}$ or $\mathbb{C}$ with the Euclidean topology and showing that invertible matrices are dense in the usual sense here.)
Here is a cleaner algebraic reformulation of the proof, working just over the universal ring $\mathbb{Z}[a_{ij}, b_{ij}]$. Observe that
$$\det(A) \det(tI - BA) = \det(tA - ABA) = \det(tI - AB) \det(A)$$
and now use the fact that $\mathbb{Z}[a_{ij}, b_{ij}]$ is an integral domain (so, geometrically, its spectrum is an irreducible affine scheme), so we can cancel $\det(A)$ from both sides, despite the fact that it is not true that the the determinant of a matrix always vanishes. $\Box$
Best Answer
It's simple to see that every matrix $n\times n$ has to be a zero of some polynomial of degree at most $n^2$, simply because the space of $n\times n$ matrices has dimension $n^2$. The Cayley-Hamilton Theorem says that you can find such a polynomial of much smaller degree.
Another way to see this is as follows: For each fixed vector $v$, the vectors $v$, $Av$, ..., $A^n v$ cannot be linearly independent and so there is a polynomial $p$ of degree at most $n$ such that $p(A)v=0$. However, this polynomial depends on $v$. The Cayley-Hamilton Theorem gives you a polynomial that works for all vectors $v$.
Finally, for applications, having a polynomial $p$ such that $p(A)=0$ allows you to compute all powers of $A$ as a linear combination of $I$, $A$, ..., $A^{n-1}$. Indeed, assuming $p$ monic, you can write $p(X)=X^n+q(X)$ with $q$ of degree less than $n$. So $A^n = -q(A)$. Then $A^{n+1}=-A q(A)$. If $A^n$ appears in $A q(A)$, replace it by $-q(A)$. Do the same for $A^{n+2} = A \cdot A^{n+1}$, etc. For a concrete example, see http://en.wikipedia.org/wiki/Cayley-Hamilton_theorem#Illustration_for_specific_dimensions_and_practical_applications.