Introduction
In the first paragraph of the proof of Theorem 8, the authors say,
We could prove this theorem by adapting the lemma before Theorem 7 to the diagonalizable case, just as we adapted the lemma before Theorem 5 to the diagonalizable case in order to prove Theorem 6.
To understand how to modify the lemma before Theorem 7, it will be helpful to understand how the authors modified the lemma before Theorem 5. This is not explicitly done but is instead hidden in the proof of Theorem 6. So first, we will state and prove a modification of the lemma before Theorem 5 and use that to prove Theorem 6. Then, we will state and prove a modification of the lemma before Theorem 7 and use that to prove Theorem 8.
Triangulation and Diagonalization of a Single Operator
First, let us look at the lemma before Theorem 5 (called Lemma A here):
Lemma A
Let $V$ be a finite-dimensional vector space over the field $F$. Let $T$ be a linear operator on $V$ such that the minimal polynomial for $T$ is a product of linear factors
$$
p = (x-c_1)^{r_1} \cdots (x-c_k)^{r_k}, \qquad c_i \text{ in } F.
$$
Let $W$ be a proper ($W \neq V$) subspace of $V$ which is invariant under $T$. There exists a vector $\alpha$ in $V$ such that (a) $\alpha$ is not in $W$; (b) $(T-cI)\alpha$ is in $W$, for some characteristic value $c$ of the operator $T$.
We want to adapt Lemma A to help us find necessary and sufficient conditions for an operator to be diagonalizable. We will look at the proof of Theorem 6 to see if we can extract the modification of Lemma A from it.
Theorem 6
Let $V$ be a finite-dimensional vector space over the field $F$ and let $T$ be a linear operator on $V$. Then $T$ is diagonalizable if and only if the minimal polynomial for $T$ has the form
$$
p = (x-c_1) \cdots (x-c_k)
$$
where $c_1,\dots,c_k$ are distinct elements of $F$.
Proof of Theorem 6: We have noted earlier that, if $T$ is diagonalizable, its minimal polynomial is a product of distinct linear factors (see the discussion on page 193 prior to Example 4).
To prove the converse, let $W$ be the subspace spanned by all of the characteristic vectors of $T$, and suppose $W \neq V$. By Lemma A, there is a vector $\alpha$ not in $W$ and a characteristic value $c_j$ of $T$ such that the vector
$$
\beta = (T-c_j I)\alpha
$$
lies in $W$. Since $\beta$ is in $W$,
$$
\beta = \beta_1 + \dots + \beta_k
$$
where $T\beta_i = c_i \beta_i$, $1 \leq i \leq k$, and therefore the vector
$$
h(T)\beta = h(c_1)\beta_1 + \dots + h(c_k) \beta_k
$$
is in $W$, for every polynomial $h$.
Now $p(x) = (x-c_j)q(x)$, for some polnomial $q$. Also
$$
q(x) - q(c_j) = (x-c_j)h(x)
$$
for some polynomial $h$, because $c_j$ is a root of the polynomial $q(x) - q(c_j)$. So, we have
$$
q(T)\alpha - q(c_j)\alpha = h(T)(T-c_j I)\alpha = h(T)\beta.
$$
But $h(T)\beta$ is in $W$ and, since
$$
0 = p(T)\alpha = (T-c_j I)q(T)\alpha,
$$
the vector $q(T)\alpha$ is in $W$. Therefore, $q(c_j)\alpha$ is in $W$. Since $\alpha$ is not in $W$, we have $q(c_j) = 0$. That contradicts the fact that $p$ has distinct roots. $$\tag*{$\blacksquare$}$$
The key idea used in the proof is that if $W$ is a proper subspace of $V$ and the minimal polynomial of $T$ is a product of distinct linear factors, then we can always find a characteristic vector (here, $q(T)\alpha$) that is not in $W$. So, we try the following modification of Lemma A:
Lemma B
Let $V$ be a finite-dimensional vector space over the field $F$. Let $T$ be a linear operator on $V$ such that the minimal polynomial for $T$ is a product of distinct linear factors
$$
p = (x-c_1) \cdots (x-c_k), \qquad c_i \text{ in } F.
$$
Let $W$ be a proper ($W \neq V$) subspace of $V$ which is invariant under $T$. There exists a vector $\alpha$ in $V$ such that (a) $\alpha$ is not in $W$; (b) $(T-cI)\alpha = 0$, for some characteristic value $c$ of the operator $T$.
Proof of Lemma B: By Lemma A, there exists a vector $\beta$ not in $W$ and a characteristic value $c_j$ of $T$ such that $(T-c_j I)\beta$ lies in $W$. If $(T-c_jI)\beta = 0$ then we are done, for $\alpha = \beta$ works. So, assume $(T-c_jI)\beta \neq 0$. We can write $p(x) = (x-c_j) q(x)$, for some polynomial $q$ such that $q(c_j) \neq 0$. So,
$$
0 = p(T)\beta = (T-c_j I)q(T)\beta.
$$
We will show that $\alpha = q(T)\beta$ is not in $W$, so this will prove the lemma. Suppose $q(T)\beta$ is in $W$. The polynomial $q(x) - q(c_j)$ has $c_j$ as a root, so we can write
$$
q(x) - q(c_j) = (x - c_j)h(x)
$$
for some polynomial $h$. So,
$$
q(T)\beta - q(c_j)\beta = h(T)(T-c_jI)\beta = h(T)(\tilde{\beta}),
$$
Since $W$ is $T$-invariant and $\tilde{\beta} = (T-c_jI)\beta$ is in $W$, $h(T)(\tilde{\beta})$ is also in $W$. So, $q(c_j)\beta$ is in $W$, but this is a contradiction because $q(c_j) \neq 0$ and $\beta$ is not in $W$. Hence, $\alpha = q(T)\beta$ is not in $W$ and $(T-c_jI)\alpha = 0$. $$\tag*{$\blacksquare$}$$
Note: the proof of this lemma is essentially an unpacking of the proof of Theorem 6.
We can now re-prove Theorem 6 using Lemma B:
(Another) Proof of Theorem 6: We have noted earlier that, if $T$ is diagonalizable, its minimal polynomial is a product of distinct linear factors (see the discussion on page 193 prior to Example 4).
To prove the converse, suppose that the minimal polynomial factors as
$$
p = (x-c_1) \cdots (x-c_k).
$$
By repeated application of Lemma B, we shall arrive at an ordered basis $\mathscr{B} = \{ \alpha_1, \dots, \alpha_n \}$ in which the matrix representing $T$ is diagonal:
$$
[T]_{\mathscr{B}} =
\begin{bmatrix}
a_{11} & 0 & 0 & \cdots & 0 \\
0 & a_{22} & 0 & \cdots & 0 \\
0 & 0 & a_{33} & \cdots & 0 \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
0 & 0 & 0 & \cdots & a_{nn}
\end{bmatrix}.
$$
Now, this merely says that
$$
T\alpha_j = a_{jj} \alpha_j, \qquad 1 \leq j \leq n
$$
that is, $T\alpha_j$ is in the subspace spanned by $\alpha_j$. To find $\alpha_1,\dots,\alpha_n$, we start by applying Lemma B to the subspace $W = \{ 0 \}$ to obtain the vector $\alpha_1$. Then, apply Lemma B to $W_1$, the space spanned by $\alpha_1$, and we get $\alpha_2$. Next apply Lemma B to $W_2$, the space spanned by $\alpha_1$ and $\alpha_2$. Continue in that way. One point deserves comment. After $\alpha_1,\dots,\alpha_i$ have been found, it is the scaling-type relations $T\alpha_j = a_{jj} \alpha_j$ for $j = 1,\dots,i$ which ensure that the subspace spanned by $\alpha_1,\dots,\alpha_i$ is invariant under $T$. $$\tag*{$\blacksquare$}$$
Note: this proof is entirely analogous to the proof of Theorem 5 on page 203 that makes use of Lemma A.
Simultaneous Triangulation; Simultaneous Diagonalization
Now, to find sufficient conditions for a family of operators to be simultaneously triangulable we need to modify Lemma A slightly. This is the lemma before Theorem 7, which we state here as Lemma C:
Lemma C
Let $V$ be a finite-dimensional vector space over the field $F$. Let $\mathscr{F}$ be a commuting family of triangulable linear operators on $V$. Let $W$ be a proper subspace of $V$ which is invariant under $\mathscr{F}$. There exists a vector $\alpha$ in $V$ such that (a) $\alpha$ is not in $W$; (b) for each $T$ in $\mathscr{F}$, the vector $T\alpha$ is in the subspace spanned by $\alpha$ and $W$.
We want to adapt Lemma C to help us find necessary and sufficient conditions for a family of operators to be simultaneously diagonalizable. We use the statement of Lemma B to come up with the following statement for the modified lemma:
Lemma D
Let $V$ be a finite-dimensional vector space over the field $F$. Let $\mathscr{F}$ be a commuting family of diagonalizable linear operators on $V$. Let $W$ be a proper subspace of $V$ which is invariant under $\mathscr{F}$. There exists a vector $\alpha$ in $V$ such that (a) $\alpha$ is not in $W$; (b) for each $T$ in $\mathscr{F}$, the vector $T\alpha$ is in the subspace spanned by $\alpha$.
The proof of Lemma D is completely analogous to the proof of Lemma C. We just replace Lemma A with Lemma B, and use the modified condition (b) in place of the old condition. We give the detailed steps below.
Proof of Lemma D: It is no loss of generality to assume that $\mathscr{F}$ contains only a finite number of operators, because of this observation. Let $\{ T_1,\dots,T_r \}$ be a maximal linearly independent subset of $\mathscr{F}$, i.e., a basis for the subspace spanned by $\mathscr{F}$. If $\alpha$ is a vector such that $(b)$ holds for each $T_i$, then (b) will hold for every operator which is a linear combination of $T_1,\dots,T_r$.
By Lemma B, we can find a vector $\beta_1$ (not in $W$) and a scalar $c_1$ such that $(T_1 - c_1 I)\beta_1 = 0$. Let $V_1$ be the collection of all vectors $\beta$ in $V$ such that $(T_1 - c_1 I)\beta = 0$. Then $V_1$ is a subspace of $V$. Furthermore, $V_1$ is invariant under $\mathscr{F}$, for this reason. If $T$ commutes with $T_1$ and $\beta$ is in $V_1$, then
$$
(T_1 - c_1 I)(T\beta) = T(T_1 - c_1 I)\beta = T(0) = 0.
$$
So, $T\beta$ is in $V_1$ for all $T$ in $\mathscr{F}$, i.e., $V_1$ is invariant under $\mathscr{F}$.
Now $W \cap V_1$ is a proper subspace of $V_1$. Let $U_2$ be the linear operator on $V_1$ obtained by restricting $T_2$ to the subspace $V_1$. The minimal polynomial for $U_2$ divides the minimal polynomial for $T_2$. Therefore, we may apply Lemma B to that operator and the invariant subspace $W \cap V_1$. We obtain a vector $\beta_2$ in $V_1$ (not in $W \cap V_1$ and hence not in $W$) and a scalar $c_2$ such that $(T_2 - c_2 I)\beta_2 = 0$. Note that
- $\beta_2$ is not in $W$;
- $(T_1 - c_1 I)\beta_2 = 0$;
- $(T_2 - c_2 I)\beta_2 = 0$.
Let $V_2$ be the set of all vectors $\beta$ in $V_1$ such that $(T_2 - c_2 I)\beta = 0$. Then $V_2$ is a subspace of $V_1$ that is invariant under $\mathscr{F}$. Apply Lemma B to $U_3$, the restriction of $T_3$ to $V_2$, and the invariant subspace $W \cap V_2$. We obtain a vector $\beta_3$ in $V_2$ (not in $W \cap V_2$ and hence not in $W$) and a scalar $c_3$ such that $(T_3 - c_3 I)\beta_3 = 0$. Note that
- $\beta_3$ is not in $W$;
- $(T_1 - c_1 I)\beta_3 = 0$;
- $(T_2 - c_2 I)\beta_3 = 0$;
- $(T_3 - c_3 I)\beta_3 = 0$.
If we continue in this way, we shall reach a vector $\alpha = \beta_r$ (not in $W$) such that $(T_j - c_j I)\alpha = 0$, $j = 1,\dots,r$. $$\tag*{$\blacksquare$}$$
We can now state and prove Theorem 8 using Lemma D:
Theorem 8
Let $V$ be a finite-dimensional vector space over the field $F$. Let $\mathscr{F}$ be a commuting family of diagonalizable linear operators on $V$. There exists an ordered basis for $V$ such that every operator in $\mathscr{F}$ is represented by a diagonal matrix in that basis.
Proof of Theorem 8: Given Lemma D, this theorem has the same proof as (the second proof of) Theorem 6, if one replaces $T$ by $T \in \mathscr{F}$. $$\tag*{$\blacksquare$}$$
Here are answers to your questions:-
- Firstly, when you say scalars, $a_1, a_2, \cdots, a_n$, they are real numbers and hence can also be $0$. Keeping this in mind, suppose there is a set $S = \left\lbrace v_1, v_2, \cdots, v_n \right\rbrace \subseteq V$. Then, quite obviously, the vector $\textbf{0} \in V$ cab ve written as
$$0 \cdot v_1 + 0 \cdot v_2 + \cdots + 0 \cdot v_n = \textbf{0}$$
This is what we call the "trivial linear combination".
In fact, what confusion you have in mind is that when you say that a vector is a linear combination of other vectors, there must be at least one vector and one scalar with which you can construct your "linear combination".
- When you talk about a "set", elements cannot be repeated. So, there is no point of asking if the elements of the set are distinct.
Lastly, I do not know what book you are following, but I feel that a better version of definitions of linear dependence and independence is the following:-
Linear Independence
A finite set $S = \left\lbrace v_1, v_2, \cdots, v_n \right\rbrace \subseteq V$ is said to be linearly independent iff
$$\alpha_1 \cdot v_1 + \alpha_2 \cdot v_2 + \cdots + \alpha_n \cdot v_n = \textbf{0}$$
implies that $\alpha_1 = \alpha_2 = \cdots = \alpha_n = 0$. This actually means that the only way you can obtain the zero vector $\textbf{0}$ from a linearly "independent" set is by setting the scalars (coefficients) to be $0$, which we call the "trivial" combination.
In case of an infinite set $S \subseteq V$, it is said to be linearly independent iff every finite subset of $S$ is linearly independent. We have definition of linear independence of finite sets which can be used.
Linear Dependence
A finite set $S = \left\lbrace v_1, v_2, \cdots, v_n \right\rbrace \subseteq V$ is said to be linearly "dependent" iff it is not linearly independent. Thus, we need to negate the statement for linear independence. The negation of the statement
"$\exists \alpha_1, \alpha_2, \cdots, \alpha_n \in \mathbb{R}$ and $i \in \left\lbrace 1, 2, \cdots, n \right\rbrace$ such that $\alpha_1 \cdot v_1 + \alpha_2 \cdot v_2 + \cdots + \alpha_n \cdot v_n = \textbf{0}$ and $\alpha_i \neq 0$"
This statement means that the vector $v_i \in S$ can be actually written as a linear combination of the other vectors. In particular,
$$v_i = \left( - \dfrac{\alpha_1}{\alpha_i} \right) \cdot v_1 + \left( - \dfrac{\alpha_2}{\alpha_i} \right) \cdot v_2 + \cdots + \left( - \dfrac{\alpha_{i - 1}}{\alpha_i} \right) \cdot v_{i - 1} + \left( - \dfrac{\alpha_{i + 1}}{\alpha_i} \right) \cdot v_{i + 1} + \cdots + \left( - \dfrac{\alpha_n}{\alpha_i} \right) \cdot v_n$$
and therefore the vector $v_i \in S$ is "dependent" on the other vectors.
In fact, the linear combination $\alpha_1 \cdot v_1 + \alpha_2 \cdot v_2 + \cdots + \alpha_n \cdot v_n = \textbf{0}$ is called the "non - trivial" linear combination.
For an infinite set $S \subseteq V$, it is said to be linearly dependent iff it is not linearly independent. Again, we need to negate the statement for linear independence of infinite set. The negation of the statement would be
"There exists a finite set $A \subset S$ such that $A$ is not linearly independent". And now, we do have the definition of linear dependence (not linear independence) for finite sets which can be used.
I hope your confusion about distinct elements will be cleared by this. And if you are still confused, try forming sets which are linearly dependent and independent in $\mathbb{R}^2$ and $\mathbb{R}^3$ which you can easily visualize. Also read some material on span of a set and how we can connect linear combination and span with linear dependence and independence.
Best Answer
It's almost much ado about nothing. In linear algebra, when we don't need or care about the ability to denote the same vector twice, or to denote vectors in a particular order, it can be convenient to work with sets of vectors. Other times, when we do care about one or both of those things -- like when choosing basis vectors in a certain order for coordinate representations, as the authors do in the next section -- it's more convenient to work with lists (sequences, tuples) of vectors.
The issue the authors are trying to highlight is that we need to be careful about distinctness of vectors in relation to linear dependence when switching between sets and lists. Consider the definition of linear dependence for sets, paraphrased here (from p.40):
Notice we require distinctness of $\alpha_1,\ldots,\alpha_n$ here. If we didn't, then any nonempty set $S$ of vectors would trivially be linearly dependent, because for $\alpha\in S$ we can always take $\alpha_1=\alpha_2=\alpha$ and write $1\cdot\alpha_1+(-1)\cdot\alpha_2=0$.
On the other hand, compare this with the corresponding definition for (finite) lists, in which the elements need not be distinct, paraphrased (from p.47):
At first glance it's tempting to think that a list $(\alpha_1,\ldots,\alpha_n)$ is linearly dependent if and only if its underlying set $\{\alpha_1,\ldots,\alpha_n\}$ is, but that's false because of the distinctness issue. More specifically if the underlying set is linearly dependent, then the list is linearly dependent, but the converse is false.
The example they give is meant to illustrate this difference epistemically. In $\mathbb{R}^2$ if $\alpha_1=(e^{\pi/2},1)$ and $\alpha_2=(\sqrt[3]{110},1)$, then we know that the set $\{\alpha_1,\alpha_2\}$ is linearly independent (that is, not linearly dependent) even if we don't know whether $e^{\pi/2}=\sqrt[3]{110}$:
On the other hand to know that the list $(\alpha_1,\alpha_2)$ is linearly independent, we need to know that $e^{\pi/2}\ne\sqrt[3]{110}$.
Is this a big deal? No. In fact some books which use both sets and lists don't even bother mentioning it, while other books stick primarily to lists to avoid the issue. But it can cause confusion and error for beginners when switching between sets and lists, which is probably why the authors wrote this. However, I suspect what they wrote serves to increase confusion more than decrease it.