(I hate to ruin the mood --- but the pessimist's view of this theorem would be as follows. The very fact that this theorem holds for all linear operators on finite dimensional complex vector spaces tells us that upper-triangularity is less of a "pretty neat fact" than we might have initially thought.)
Your observation is entirely correct.
The author probably isn't highlighting it because, although formally it is stronger than what you would expect from upper-triangularity alone, there isn't much use for the stronger conclusion. The problem with the stronger conclusion is that it occurs in the middle of an induction argument. To see its effect on what it says about arbitrary operators on finite dimensional vector spaces, we have to consider what happens when we iterate the stronger conclusion, with a view toward coming up with a stronger statement about what the matrix of an arbitrary linear operator can be made to look like. And if you think about this, the first thing you notice is that it's quite possible that, for a given $T$, the range of $T - \lambda I$ has dimension exactly one less than the space that $T$ acts on. In this case, the "$n$" in the above argument is 1, and the "stronger" conclusion coincides with what you get from "upper triangular"...
To vaguely answer the question that you removed in this version of the question (asking whether there might be uses to putting more zeros in a matrix than "upper triangular" would assure you are there):
For theoretical purposes, generally it is not that useful to be able to have more zeros in a matrix, unless you have a lot more control over where those zeros are than this argument provides. (In my experience, anything short of something that would give you a block decomposition, or something very close to it, is not going to be of much theoretical use.) And there's a result that's far better than "upper triangular" in the complex case. Later in Axler, he proves that if $T$ is an operator on a finite-dimensional complex vector space $V$, there is some basis of $V$ for which $T$ decomposes as a direct sum of Jordan blocks. These matrices have tons more zeros in them than arbitrary upper triangular matrices do. But for a lot of theoretical questions, the Jordan decomposition simply isn't that useful. Again, the pessimist's view: if something is true of all linear operators on finite-dimensional complex vector spaces, it can't be that great, or all of linear algebra would be easier than we know it is.
For the purpose of numerical computations (on computers) with very large matrices, I think there is some advantage in having as many zeros in a matrix as possible. But a proper evaluation of the numerical usefulness of the algorithm implicit in Axler's proof would require an analysis of the numerical cost of having to compute a basis for the range of an operator (known only numerically) at each step. Once you take that into account, I don't know that this algorithm would have anything to recommend it over other algorithms (e.g. for approximating the Jordan form of a matrix given only numerically).
There's not a problematic difference. What a list is, in terminology you're more likely to encounter in the future, is simply an indexed set, which in general is just a set $S$ endowed with a function $i:I\to S$, but which is usually indexed by $\{1,...,n\}$ or by $\mathbb{N}$ in the familiar way. So, the difference is that a list comes with a specified order, as a set does not. Having this order built in is important for the theory of orientations and determinants, whereas it's never important not to use a list. Indeed, every set is canonically indexed by itself (via the identity function,) so the study of lists is in this sense strictly more general than the study of sets.
For your example, you would probably just prove the proposition in case $x$ is adjoined to the end of the list. There's an easy theorem, which Axler may or may not point out explicitly, that the span of a list doesn't depend on the ordering, i.e. it's just the span of the underlying set, so this covers concerns with adjoining $x$ at different points. (The theorem is proven via the commutativity of vector addition.)
Best Answer
In finite-dimensional spaces, the main theorem is the one that leads to the definition of dimension itself: that any two bases have the same number of vectors. All the others (e.g., reducing a quadratic form to a sum of squares) rest on this one.
In infinite-dimensional spaces, (1) the linearity of an operator generally does not imply continuity (boundedness), and, for normed spaces, (2) "closed and bounded" does not imply "compact" and (3) a vector space need not be isomorphic to its dual space via canonical isomorphism.
Furthermore, in infinite-dimensional vector spaces there is no natural definition of a volume form.
That's why Halmos's Finite-Dimensional Vector Spaces is probably the best book on the subject: he was a functional analyst and taught finite-dimensional while thinking infinite-dimensional.