Questions about the Cayley-Hamilton theorem for modules

cayley-hamiltoncommutative-algebradeterminantlinear algebravector-spaces

Having recently learned the proof of CH for vector spaces from Hoffman&Kunze (I've known the statement of the theorem for a while now, but have never really bothered with the proof), I am now trying to wrap my head around the proof for finitely-generated modules, following Eisenbud's in his commutative algebra book and have some questions about this more general version.

Let $R$ be a ring, $I\subset R$ an ideal, and $M$ an $R$-module that can be generated by $n$ elements. Let $\varphi$ be an endomorphism of $M$. If $$\varphi(M)\subset IM$$
then there is a monic polynomial $$p(x)=x^n+p_1x^{n-1}+\cdots+p_n$$
with $p_j\in I^j$ for each $j$, such that $p(\varphi)=0$ as an endomorphism of $M$.

Firstly, why is it necessary to require that $\varphi(M)\subset IM$? I understand why this isn't required in the version for vector spaces, as the only ideals of a field are $(0)$ and $(1)$, so the condition is always true, but why is this required in the general case? As far as I can tell this condition isn't used in the proof itself. Secondly, why is $p_j\in I^j$ true? I can see intuitively that due to the inclusion condition the "matrix" (if we're allowed to talk about such things) of $\varphi$ will have entries in $I$ (is this why we require that $\varphi(M)\subset IM$, so that we can say where the coefficients of the matrix of $\varphi$ lie?), so the coefficients of the characteristic polynomial must be in $I$ as well, but how can we show formally that each is a $j$-fold power of $I$? Next, about the proof itself. The proof for modules uses the same idea as the one for vector spaces, wherein you view the vector space/module as a module over the ring of polynomials in $\varphi$ and then you use the defining property of the adjugate matrix, and I think I more or less understand it, except for the final part.

View the equation $\varphi(m_i)=\sum_j a_{ij}m_j$ (where $m_j$ are the generators of $M$ and $a_{ij}\in I$) as the equation $(\varphi\mathbf{1}-A)\cdot m=0$, multiply it by the adjugate to get $[\det(\varphi\mathbf{1}-A)]\mathbf{1}\cdot m=0$, therefore $\det(\varphi\mathbf{1}-A)\cdot m_j=0$ for each generator $m_j$, from which follows that $[\det(\varphi\mathbf{1}-A)]\cdot M=0$, i.e. $\det(\varphi\mathbf{1}-A)$ is the $0$ endomorphism.

I think I understand how this works for vector spaces (it still somehow feels tautological, as if we're cheating somehow by just plugging in $\varphi$), but I don't understand the following bit: in the case of vector spaces, from the fact that $\det(\varphi\mathbf{1}-A)\cdot m_j=0$ for the generators $m_j$ we can definitively confirm that $\det(\varphi\mathbf{1}-A)=0$ (because $a\cdot x=0,x\neq0\Rightarrow a=0$ in vector spaces), but how can we conclude the same for modules? Isn't there such a thing as torsion elements, i.e. $a\cdot x=0$ such that neither $a$ nor $x$ are $0$?

Thank you in advance.

Best Answer

I think I'm a bit late! Either way:

  1. Yes, we require $\varphi(M) \subseteq IM$ to guarantee that $p_j \in I^j$ (this is particularly useful for commutative algebra when $I$ is prime). Note that $I = R$ simply ignores where each $p_i$ lives; the condition $M \subseteq IM$ tells you that you can pick $a_{ij} \in I$, and thus their $k$-products belong to $I^k$.

The implicit usage of $\varphi(M) \subseteq IM$ goes as follows: since $M$ is finitely generated (say, $\{m_i\}$ is a generating set), then $IM$, the submodule generated by elements of the form $rx$ where $r \in I,\ x \in M$, is simply $IM = \left\{\sum\limits_i r_i m_i\colon\ r_i \in I\right\}$. To see that, note that RHS is an $R$-module and any element of the form $rx$ belongs to it (so the generated module, $IM$, does too). The reverse inclusion is obvious. So $\varphi(m_i) \in IM$ and the coefficients belong to $I$.

  1. We can conclude that $p_j \in I^j$ because the determinant is an homogeneous polynomial of degree $n$ in $n^2$ variables. If we group together the terms of degree $n-j$ in $\varphi$, we are left with sums of $j$ products of elements from $I$, i.e. the coefficient lies in $I^j$.

  2. The important and last part here is a bit tricky. What we do is consider the "generating vector" $m = \begin{pmatrix} m_1 \\ \vdots \\ m_n\end{pmatrix}$. By construction of such a matrix $A$, we have $(\varphi\mathbf 1-A)\cdot m = \begin{pmatrix} 0 \\ \vdots \\ 0 \end{pmatrix}$ - but it wouldn't make any sense to say that this matrix $\varphi\mathbf 1 - A$ has coefficients in $R$. It doesn't! It has coefficients in $R[\varphi] \subseteq \mathrm{End}(M)$! So what we do here is, actually, consider this ring acting over $M$, with $\varphi\cdot m = \varphi(m)$. So now we can in fact use Cramer's rule to get that $\det(\varphi\mathbf 1 - A)\cdot m = \vec{\mathbf 0}$. But that means, by linearity, that the operator $\det(\varphi \mathbf 1 - A) \in \mathrm{End}(M)$ has as image $(0) \subseteq M$. i.e., this is the 0 module homomorphism. So $\det(\varphi \mathbf 1 - A) = 0$.

Related Question