I'm reading "Linear Algebra" by Kenneth Hoffman and Ray Kunze.
I don't quite understand why there's a long proof in $\S$6.4 Theorem 6.
First the triangular matrix is defined:
An $n\times n$ matrix $A$ is called triangular if $A_{ij}=0$ whenever $i>j$ or if $A_{ij}=0$ whenever $i<j$.
Then defined triangulable:
The linear operator $T$ is called triangulable if there is an ordered basis in which $T$ is represented by a triangular matrix.
Then there's Theorem 5:
Let $V$ be a finite-dimensional vector space over the field $F$ and
let $T$ be a linear operator on $V$. Then $T$ is triangulable if and
only if the minimal polynomial for $T$ is a product of linear
polynomials over $F$.
Now it comes Theorem 6:
Let $V$ be a finite-dimensional vector space over the field $F$ and
let $T$ be a linear operator on $V$. Then $T$ is diagonalizable if and
only if the minimal polynomial for $T$ has the form $p = (x – c_1)
\dots (x – c_k)$ where $c_1, \dots , c_k$ are distinct elements of
$F$.
The proof is: (the (1)(2).. numbers are added by me)
Proof
(1) We have noted earlier that, if $T$ is diagonalizable, its minimal
polynomial is a product of distinct linear factors (see the discussion
prior to Example 4).(2)To prove the converse, let $W$ be the subspace spanned by all of
the characteristic vectors of $T$, and suppose $W \ne V$. ….
What I don't understand is (2) — why we need such a long proof (details are below) here?
Since Theorem 5 already proved that "minimal polynomial factors $p=(x-c_1)^{r_1}\dots(x-c_k)^{r_k}$, $c_i$ distinct $\Rightarrow$ $T$ is triangulable";
this part of Theorem 6 is "minimal polynomial factors $p=(x-c_1) \dots(x-c_k)$, $c_i$ distinct $\Rightarrow$ $T$ is triangulable",
so we just need to let all the $r_i$ be $1$, isn't it?
Proof details excerpted from Hoffman
(1) We have noted earlier that, if $T$ is diagonalizable, its minimal
polynomial is a product of distinct linear factors (see the discussion
prior to Example 4).(2)To prove the converse, let $W$ be the
subspace spanned by all of the characteristic vectors of $T$, and
suppose $W \ne V$.(3)By the lemma used in the proof of Theorem 5,
there is a vector $\alpha$ not in $W$ and a characteristic value $c_j$
of $T$ such that the vector $\beta= (T – c_jI)\alpha$ lies in W.(4)Since $\beta$ is in $W$, $\beta = \beta_1+\dots\beta_k$ where
$T\beta_i = c_i\beta_i$, $1\le i\le k$, and therefore the vector
$h(T)\beta = h(c_1)\beta_1+\dots+h(c_k)\beta_k$ is in $W$, for every
polynomial $h$.(5)Now $p = (x-c_j)q$, for some polynomial $q$.
(6)Also $q- q(c_j) = (x – c_j)h$.
(7)We have $q(T)\alpha – q(c_j)\alpha = h(T)(T – c_jI)\alpha = h(T)\beta$.
(8)But $h(T)\beta$
is in $W$ and, since $0 = p(T)\alpha = (T – c_jI)q(T)\alpha$, the
vector $q(T)\alpha$ is in $W$.(9)Therefore, $q(c_j)\alpha$ is in
$W$.(10)Since $\alpha$ is not in $W$, we have $q(c_j) = 0$.
(11)That contradicts the fact that $p$ has distinct roots. QED.
Best Answer
The answer to your confusion is that triangulable does not imply diagonalisable.
But that does not mean one has to give as complicated a proof as Hoffman and Kunze do. I would personally not derive the result from the theorem 5. Also I would generalise the harder direction as
(it does not matter whether the annihilating polynomial $P$ is minimal or not). The best proof is probably to use the kernel decomposition theorem which here gives you the decomposition of the whole space as direct sum of the subspaces $\def\I{\mathbf I}\ker(T-c_i\I)$, which (in so far as they are nonzero) are the eigenspaces.
But one can even do without that theorem, using just the following
This follows easily from the case of a composition $g\circ f$ of two operators; see this question, noting that $\dim(\ker(g) \cap \operatorname{im}(f))\leq\dim(\ker(g))$.
Now for our result, since $(T-c_1\I)\circ\cdots\circ(T-c_k\I)=0$, its kernel is all of $V$. The lemma says that $\dim(V)$ is at most the sum of the dimensions of the subspaces $V_i=\ker(T-c_i\I)$. But the nonzero spaces among those are the eigenspaces of$~T$, and the sum of eigenspaces for different eigenvalues is always direct. Therefore $\dim(V_1)+\cdots+\dim(V_k)=\dim(V_1\oplus\cdots\oplus V_k)$. If that dimension is at least $\dim(V)$, then it is equal to $\dim(V)$, and $V_1\oplus\cdots\oplus V_k=V$; then $T$ is diagonalisable.