I'm currently going through Sheldon's Axler's Linear Algebra Done Right and am struggling to understand his proof of Theorem 8.23.
Suppose we are given a complex vector space $V$ and a linear transformation $T \in L(V)$. Let $\lambda_1, \dots, \lambda_m$ be the distinct eigenvalues of $T$, and let $U_1, \dots, U_m$ be the corresponding subspaces of the generalized eigenvectors. Theorem 8.23 claims it must be the case that $V = U_1 \oplus\dots\oplus U_m$.
The full proof can be found here, listed as Theorem 8.23 on page 174.
In his proof, Axler defines $U = U_1 + \dots + U_m$ and a function $S = T|_U$. He then claims that $S$ has the same eigenvalues, with the same multiplicities as $T$ because all the generalized eigenvectors of $T$ are in $U$, the domain of $S$.
I don't understand why this is the case. Consider a generalized eigenvector $v$ of $T$, such that
$$
v \in \operatorname{null}(T – \lambda_i I)^{\dim V}.
$$
To be a generalized eigenvector of $S$, that would mean $v \in \operatorname{null}(T – \lambda_i I)^{\dim U}$. I don't understand how this follows from the previous statement.
Thanks in advance.
Best Answer
The identity $U_j = \operatorname{Null} (T - \lambda_j I)^{\dim V}$ is useful when you want to talk about $U_j$ as a nullspace like for instance you can apply the proposition which says $\operatorname{Null} p(T)$ is $T$-invariant.
But this isn't always how you want to think about generalized eigenvectors. Here are some other important facts:
This last fact implies that $\dim U_j \ge k$ for any order $k$ because a generalized eigenvector of order $k$ yields $k$ linearly independent generalized eigenvectors of orders $1$ through $k$. So actually, we have $U_j = \operatorname{Null} (T - \lambda_j I)^{\dim U_j}$ and this is independent of $V$.
I don't think Axler was thinking quite in these terms: just that if $(T - \lambda_j I)^k v = 0$ then $(S - \lambda_j I)^k v = 0$ because $S$ is just $T$ but with a restricted domain.
And if you approach this as: a generalized eigenvector means "there is some order $k$" then it will make sense because that doesn't depend on $V$ a priori. We can show that $k \le \dim U_j \le \dim U \le \dim V$ but those identities are consequences, not definitions: $$ U_j = \operatorname{Null}(T - \lambda_j I)^{\dim U_j} = \operatorname{Null}(T - \lambda_j I)^{\dim U} = \operatorname{Null}(T - \lambda_j I)^{\dim V}. $$