No, the converse of Cayley-Hamilton is not true for $n\times n$ matrices with $n\gt 1$; in particular, it fails for $2\times 2$ matrices.
For a simple counterexample, notice that if $p(A)=0$, then for every multiple $q(x)$ of $p(x)$ you also have $q(A)=0$; so you would want to amend the converse to say "if $p(A)=0$, then $p(a)$ is a multiple of the characteristic polynomial of $A$". But even that amended version is false
However, the only failure in the $2\times 2$ matrix case are the scalar multiples of the identity. If $A=\lambda I$, then $p(x)=x-\lambda$ satisfies $p(A)=0$, but the characteristic polynomial is $(x-\lambda)^2$, not $p(x)$.
For bigger matrices, there are other situations where even this weakened converse fails.
The concept that captures the "converse" of Cayley-Hamilton is the minimal polynommial of the matrix, which is the monic polynomial $p(x)$ of smallest degree such that $p(A)=0$. It is then easy to show (using the division algorithm) that if $q(x)$ is any polynomial for which $q(A)=0$, then $p(x)|q(x)$. (Be careful to justify that if $m(x)=r(x)s(x)$, then $m(A)=r(A)s(A)$; this is not immediate because matrix multiplication is not in general commutative!) So we have:
Theorem. Let $A$ be an $n\times n$ matrix over $\mathbf{F}$, and let $\mu(x)$ be the minimal polynomial of $A$. If $p(x)\in \mathbf{F}[x]$ is any polynomial such that $p(A)=0$, then $\mu(x)$ divides $p(x)$.
The Cayley-Hamilton Theorem shows that the characteristic polynomial is always a multiple of the minimal polynomial. In fact, one can prove that every irreducible factor of the characteristic polynomial must divide the minimal polynomial. Thus, for a $2\times 2$ matrix, if the characteristic polynomial splits and has distinct roots, then the characteristic and minimal polynomial are equal. If the characteristic polynomial is irreducible quadratic and we are working over $\mathbb{R}$, then again the minimal and characteristic polynomials are equal. But if the characteristic polynomial is of the form $(x-a)^2$, then the minimal polynomial is either $(x-a)$ (when the matrix equals $aI$), or $(x-a)^2$ (when the matrix is not diagonalizable).
As for solving this problem: if $\lambda$ is an eigenvalue of $A$, and $A$ is invertible, then $\lambda\neq 0$, and $\frac{1}{\lambda}$ is an eigenvalue of $A^{-1}$: for if $\mathbf{x}\neq\mathbf{0}$ is such that $A\mathbf{x}=\lambda\mathbf{x}$, then multiplying both sides by $A^{-1}$ we get $\mathbf{x} = A^{-1}(\lambda \mathbf{x}) = \lambda A^{-1}\mathbf{x}$. Dividing through by $\lambda$ shows $\mathbf{x}$ is an eigenvector of $A^{-1}$ corresponding to $\frac{1}{\lambda}$.
Since $A=A^{-1}$, that means that if $\lambda_1,\lambda_2$ are the eigenvalues of $A$, then $\lambda_1 = \frac{1}{\lambda_1}$ and $\lambda_2=\frac{1}{\lambda_1}$; thus, each eigenvalue is either $1$ or $-1$.
If the matrix is diagonalizable, then we cannot have both equal to $1$ (since then $A=I$), and they cannot both be equal to $-1$ (since $A\neq -I$), so one eigenvalue is $1$ and the other is $-1$. Since the trace of a square matrix equals the sum of its eigenvalues, the sum of the eigenvalues is $0$.
Why is $A$ diagonalizable? If it has two distinct eigenvalues, $1$ and $-1$, then there is nothing to do; we know it is diagonalizable. If it has a repeated eigenvalue, say $1$, but $A-I$ is not the zero matrix, pick $\mathbf{x}\in \mathbb{R}^2$ such that $A\mathbf{x}\neq \mathbf{x}$; then $$\mathbf{0}=(A-I)^2\mathbf{x} = (A^2-2A + I)\mathbf{x} = (2I-2A)\mathbf{x}$$
by the Cayley Hamilton Theorem. But that means that $2(A-I)\mathbf{x}=\mathbf{0}$, contradicting our choice of $\mathbf{x}$. Thus, $A-I=0$, so $A=I$ and $A$ is diagonalizable. A similar argument shows that if $-1$ is the only eigenvalue, then $A+I=0$.
.
(Hiding behind that paragraph is the fact that if the minimal polynomial is squarefree and splits, then the matrix is diagonalizable; since $p(x)=x^2-1=(x-1)(x+1)$ is a multiple of the minimal polynomial, the matrix must be diagonalizable).
So this completes the proof that the trace must be $0$, given that $A\neq I$ and $A\neq -I$.
There is another way to see that the proof must be flawed: by finding the interesting consequences this proof technique has. If the proof would be valid, then we would also have the following generalisation:
Faulty Lemma. Suppose that $A$ and $B$ are $n\times n$ matrices. Let $p_A$ be the characteristic polynomial for $A$. If $B - A$ is singular, then $B$ must be a zero of $p_A$.
Faulty proof: We have $p_A(B) = \det(BI - A) = \det(B - A) = 0$.$$\tag*{$\Box$}$$
This has the following amazing consequence:
Faulty Corollary. Every singular matrix is nilpotent.
Faulty proof: Let $B$ be a singular matrix and let $A$ be the zero matrix. Now we have $p_A(\lambda) = \lambda^n$. Furthermore, by the above we have $p_A(B) = 0$, because $B - A$ is singular. Thus, we have $B^n = 0$ and we see that $B$ is nilpotent.$$\tag*{$\Box$}$$
In particular, this proves that we have
$$ \pmatrix{1 & 0 \\ 0 & 0}^2 = \pmatrix{0 & 0 \\ 0 & 0}. $$
This comes to show just how wrong the proof is!
Best Answer
"The" proof of the Cayley-Hamilton Theorem involves invariant subspaces, or subspaces that are mapped onto themselves by a linear operator. If $T$ is a linear operator on a vector space $V$, then a subspace $W\subseteq V$ is called a $T$-invariant subspace of $V$ if $T(W)\subseteq W$, i.e. if $T(v)\in W$ for every $v\in W$. Some examples of $T$-invariant subspaces you might be familiar with are $\{0\}, N(T), R(T), V$, and $E_\lambda$ for any eigenvalue $\lambda$ of $T$. For a linear operator $T$ and any nonzero $x\in V$, then the subspace $$ W=\textrm{span}(\{x,T(x),T^2(x),\dots\})$$ is called the $T$ cyclic subspace of $V$ generated by $x$, and one can show that $W$ is the smallest $T$-invariant subspace containing $x$. Cyclic subspaces can be used to establish the Cayley-Hamilton Theorem. In fact, the existence of a $T$-invariant subspace allows us to define a new linear operator whose domain is this subspace, i.e. the restriction $T_W$ of $T$ to $W$ is a linear operator from $W$ to $W$. These two operators are linked in the sense that the characteristic polynomial of $T_W$ divides the characteristic polynomial of $T$. You can show this by choosing your favorite ordered basis for $W$ and extending it to an ordered basis for $V$, then taking the matrix representations of $T$ and $T_W$, and computing the characteristic polynomial of $T$, one will see that the characteristic polynomial of $T_W$ can be recovered.
The last tool we will need is how to gain information about the characteristic polynomial of $T$ from the characteristic polynomial of $T_W$. Cyclic subspaces are useful in this sense because the characteristic polynomial of the restriction of a linear operator $T$ to a cyclic subspace can be computed. In fact, if $T$ is a linear operator on a finite-dimensional vector space $V$, then if $W$ is the $T$ cyclic subspace of $V$ generated by a nonzero $v\in V$, and letting $k=\textrm{dim}(W)$, then we have that:
I will omit the proof for the above theorem unless requested, since the main goal is the proof of the Cayley-Hamilton Theorem, which states that:
Proof: To show that $f(T)(v)=0$ for all $v\in V$. If $v=0$, we are done since $f(T)$ is linear, so suppose $v\neq 0$, and let $W$ be the $T$-cyclic subspace generated by $v$ with dimension $k$. By the theorem above, there exist scalars $a_0,\dots,a_{k-1}$ such that $$a_0v+a_1T(v)+\cdots+a_{k-1}T^{k-1}(v)+T^k(v)=0 $$ and the characteristic polynomial for $T_W$ is: $$ g(t)=(-1)^k(a_0+a_1t+\cdots+a_{k-1}t^{k-1}+t^k)$$ Combining these two inequalities yields: $$g(T)(v)=(-1)^k(a_0I+a_1T+\cdots+a_{k-1}T^{k-1}+T^k)(v)=0 $$ We know that this polynomial divides the characteristic polynomial of $T$, $f(t)$, thus there exists a polynomial $q(t)$ such that $f(t)=q(t)g(t)$, so: $$ f(T)(v)=q(T)g(T)(v)=q(T)(g(T)(v))=q(T)(0)=0$$ The Cayley-Hamilton Theorem for Matrices is then a corollary to the Cayley-Hamilton Theorem stated above.