Linear Algebra – Why Does Hoffman Linear Algebra Require Long Proofs?

diagonalizationlinear algebramatricesminimal-polynomials

I'm reading "Linear Algebra" by Kenneth Hoffman and Ray Kunze.

I don't quite understand why there's a long proof in $\S$6.4 Theorem 6.

First the triangular matrix is defined:

An $n\times n$ matrix $A$ is called triangular if $A_{ij}=0$ whenever $i>j$ or if $A_{ij}=0$ whenever $i<j$.

Then defined triangulable:

The linear operator $T$ is called triangulable if there is an ordered basis in which $T$ is represented by a triangular matrix.

Then there's Theorem 5:

Let $V$ be a finite-dimensional vector space over the field $F$ and
let $T$ be a linear operator on $V$. Then $T$ is triangulable if and
only if the minimal polynomial for $T$ is a product of linear
polynomials over $F$.

Now it comes Theorem 6:

Let $V$ be a finite-dimensional vector space over the field $F$ and
let $T$ be a linear operator on $V$. Then $T$ is diagonalizable if and
only if the minimal polynomial for $T$ has the form $p = (x – c_1)
\dots (x – c_k)$ where $c_1, \dots , c_k$ are distinct elements of
$F$.

The proof is: (the (1)(2).. numbers are added by me)

Proof

(1) We have noted earlier that, if $T$ is diagonalizable, its minimal
polynomial is a product of distinct linear factors (see the discussion
prior to Example 4).

(2)To prove the converse, let $W$ be the subspace spanned by all of
the characteristic vectors of $T$, and suppose $W \ne V$. ….

What I don't understand is (2) — why we need such a long proof (details are below) here?

Since Theorem 5 already proved that "minimal polynomial factors $p=(x-c_1)^{r_1}\dots(x-c_k)^{r_k}$, $c_i$ distinct $\Rightarrow$ $T$ is triangulable";

this part of Theorem 6 is "minimal polynomial factors $p=(x-c_1) \dots(x-c_k)$, $c_i$ distinct $\Rightarrow$ $T$ is triangulable",

so we just need to let all the $r_i$ be $1$, isn't it?

Proof details excerpted from Hoffman

(1) We have noted earlier that, if $T$ is diagonalizable, its minimal
polynomial is a product of distinct linear factors (see the discussion
prior to Example 4).

(2)To prove the converse, let $W$ be the
subspace spanned by all of the characteristic vectors of $T$, and
suppose $W \ne V$.

(3)By the lemma used in the proof of Theorem 5,
there is a vector $\alpha$ not in $W$ and a characteristic value $c_j$
of $T$ such that the vector $\beta= (T – c_jI)\alpha$ lies in W.

(4)Since $\beta$ is in $W$, $\beta = \beta_1+\dots\beta_k$ where
$T\beta_i = c_i\beta_i$, $1\le i\le k$, and therefore the vector
$h(T)\beta = h(c_1)\beta_1+\dots+h(c_k)\beta_k$ is in $W$, for every
polynomial $h$.

(5)Now $p = (x-c_j)q$, for some polynomial $q$.

(6)Also $q- q(c_j) = (x – c_j)h$.

(7)We have $q(T)\alpha – q(c_j)\alpha = h(T)(T – c_jI)\alpha = h(T)\beta$.

(8)But $h(T)\beta$
is in $W$ and, since $0 = p(T)\alpha = (T – c_jI)q(T)\alpha$, the
vector $q(T)\alpha$ is in $W$.

(9)Therefore, $q(c_j)\alpha$ is in
$W$.

(10)Since $\alpha$ is not in $W$, we have $q(c_j) = 0$.

(11)That contradicts the fact that $p$ has distinct roots. QED.

Best Answer

The answer to your confusion is that triangulable does not imply diagonalisable.

But that does not mean one has to give as complicated a proof as Hoffman and Kunze do. I would personally not derive the result from the theorem 5. Also I would generalise the harder direction as

Whenever an operator $T$ is annihilated by a polynomial $P=(X-c_1)\ldots,(X-c_k)$ with all $c_i$ distinct, it is diagonalisable with set of eigenvalues contained in $\{c_1,\ldots,c_k\}$

(it does not matter whether the annihilating polynomial $P$ is minimal or not). The best proof is probably to use the kernel decomposition theorem which here gives you the decomposition of the whole space as direct sum of the subspaces $\def\I{\mathbf I}\ker(T-c_i\I)$, which (in so far as they are nonzero) are the eigenspaces.


But one can even do without that theorem, using just the following

Lemma. One has $\dim\ker(T_1\circ\cdots\circ T_k)\leq\dim\ker(T_1)+\cdots+\dim\ker(T_k)$ for any composition of linear operators $T_1,\ldots,T_k$ on $V$.

This follows easily from the case of a composition $g\circ f$ of two operators; see this question, noting that $\dim(\ker(g) \cap \operatorname{im}(f))\leq\dim(\ker(g))$.

Now for our result, since $(T-c_1\I)\circ\cdots\circ(T-c_k\I)=0$, its kernel is all of $V$. The lemma says that $\dim(V)$ is at most the sum of the dimensions of the subspaces $V_i=\ker(T-c_i\I)$. But the nonzero spaces among those are the eigenspaces of$~T$, and the sum of eigenspaces for different eigenvalues is always direct. Therefore $\dim(V_1)+\cdots+\dim(V_k)=\dim(V_1\oplus\cdots\oplus V_k)$. If that dimension is at least $\dim(V)$, then it is equal to $\dim(V)$, and $V_1\oplus\cdots\oplus V_k=V$; then $T$ is diagonalisable.

Related Question