Going on in this proof of the Cayley-Hamilton theorem

abstract-algebracayley-hamiltonlinear algebra

I am reading a proof of the Cayley-Hamilton theorem here. For a rough outline of the proof, let $A$ be a matrix representing the endomorphism $\phi$ over finitely-generated module $M$ with generators $m_1,…,m_n$. Now, we can regard $M$ as an $R[x]$-module by letting $x$ act as $\phi$.

This next part is where I am confused. They let $\mathfrak{m}$ the column vector whose entries are the $m_j$. Then, we get $(xI-A)\mathfrak{m}=0$, which I guess works by letting matrix multiplication be letting the elements in the matrix with coefficients in $R[x]$ act on the elements of $\mathfrak{m}$.

The next step multiplies both sides by the adjugate matrix to get $[\det(xI-A)]I\cdot\mathfrak{m}=0$, which then completes the proof, as we $p(\phi)=0$, where $p(x):=\det(xI-A)$.

I guess my real question is: what is actually going on? I've never seen matrices used this way; is this just a formal manipulation? But then it feels like multiplying by the adjugate matrix is "wrong". How do I know that the manipulation in $R[x]$ preserves the module action structure? I'm sorry for phrasing this poorly but I just have a feeling that something is off and I may not have it articulated it that well.

Best Answer

My previous answer made a false claim -- that we wanted to view $M$ as an $M_{n\times n}(R[x])$-module. In fact, this will not work in general: while given an endomorphism $\phi\in\operatorname{End}_R(M)$ and a generating set $\{m_1,\dots, m_n\}$ of $M$ we may produce a matrix $A_\phi\in M_{n\times n}(R)$ such that $$\require{AMScd} \begin{CD} R^n @>A_\phi>> R^n \\ @V\pi VV @VV\pi V\\ M @>>\phi > M \end{CD} $$ commutes, it is not the case that in general a matrix $B\in M_{n\times n}(R)$ induces a well-defined endomorphism of $M.$ However, this doesn't mean we can't use the main idea that $(xI - A)^{\textrm{adj}}(xI -A) = \det(xI - A)I\in M_{n\times n}(R[x]),$ we just need to be careful.

First, let's choose our generating set $\{m_1,\dots, m_n\}$ of $M$ and our matrix representation $A_\phi$ of $\phi$ with respect to this generating set. Explicitly, we have some collection of constants $r_{ij}\in R$ such that $$ \phi(m_i) = \sum_{j=1}^n r_{ij} m_j. $$ If we let $\delta_{ij} = \begin{cases} 1,\quad i = j\\ 0,\quad i\neq j\end{cases}$ and we consider $M$ as an $R[x]$-module where $x$ acts on $M$ by $xm = \phi(m),$ then the previous equation is equivalent to $$ \sum_{j}(x\delta_{ij} - a_{ij})m_j = 0. $$

Observe that if we assemble the coefficients of the $m_j$ as we range over all $j$ and all $i$ into a matrix, we obtain $$(x\delta_{ij} - a_{ij})_{ij} = xI - A_\phi.$$ Now we apply the adjugate trick. Write $(xI - A_\phi)^{\textrm{adj}} = (b_{ij})_{ij}.$ Then the fact that $(xI - A_\phi)^{\textrm{adj}}(xI - A_\phi) = \det(xI - A_\phi) I$ means that $$ \sum_{k=1}^n b_{ik}(x\delta_{kj} - a_{kj}) = \det(xI - A_\phi)\delta_{ij}. $$ Taking our equation $0 = \sum_{j}(x\delta_{kj} - a_{kj})m_j$ and multiplying by $b_{ik},$ we have $$ 0 = \sum_j b_{ik}(x\delta_{kj} - a_{kj})m_j. $$ Next we sum these equations over $k$: \begin{align*} 0 &= \sum_{k=1}^n\sum_{j=1}^n b_{ik}(x\delta_{kj} - a_{kj})m_j\\ &=\sum_{j=1}^n\sum_{k=1}^n b_{ik}(x\delta_{kj} - a_{kj})m_j\\ &= \sum_{j=1}^n\det(xI - A_\phi)\delta_{ij} m_j\\ &= \det(xI - A_\phi)m_i. \end{align*} This holds for any $i,$ so that $\det(xI - A_\phi) = p(x)$ acts on $M$ identically as zero; i.e., $p(\phi) : M\to M$ is zero.