$v_j$ is the $j$th element of the original linearly independent set $\{v_1,\dots,v_m\}$. Also in regards to your comment, you don't know that $v_j$ is orthogonal to each $e_1,\dots,e_{j-1}$.
He is working inductively. He has assumed that for $j-1$ we can find $\{e_1,\dots,e_{j-1}\}$ such that span$\{e_1,\dots,e_{j-1}\}=$span$\{v_1,\dots,v_{j-1}\}$. Then he considers the set $\{v_1,\dots,v_j\}$. By induction we can find an orthonoromal set $\{e_1,\dots,e_{j-1}\}$ such that, as above, span$\{e_1,\dots,e_{j-1}\}=$span$\{v_1,\dots,v_{j-1}\}$. To complete the induction he throws in one more orthonormal vector into $\{e_1,\dots,e_{j-1}\}$ by taking the $j$th element of $\{v_1,\dots,v_m\}$ and forming $e_j$ as describe in your boxed 6.23.
The rest of the proof is showing why this vector is orthorgonal to each previous $e_i$ and that the set is linearly independent.
If you have an orthonormal linearly independent set $\{e_1,\dots,e_j\}$ with $j<\dim V$ then you can always throw in one more orthonormal vector. To see this, just extend $\{e_1,\dots,e_j\}$ to a basis for $V$ then preform Gram Scmidt on this set. Note that if $j=\dim V$, then you cannot throw in an orthonormal vector because then this new set would be linearly independent with size greater than $\dim V$.
Ok...so I actually answered my own question with the help of Sheldon Axler (who commented). We know $B_1=\{u_1,u_2,w_1,...,w_n\}$ is linearly dependent because $u_2$ can be written as a linear combination of the vectors in $B$. This would like like so: $$u_2=c_0u_1+c_1w_1+...+c_nw_n$$ It follows that there are vectos $w_i\in B$ with $c_i\neq 0$ and at least one of these vectors is not $u_1$ because that would mean $u_2$ was a multiple of $u_1$ which Sheldon Axler pointed out is not possible. So if we fix a $w_j$ with $c_j\neq 0$ we can write: $$w_j=(-u_2+c_0u_1+c_1w_1+...+c_nw_n)*-c_j^{-1}$$ And now we can remove $w_j$ from $B_1$ and the resulting set will still span $V$.
Best Answer
I was reading about your proof and I think that it is the same idea. First of all, he's speaking about finite set and so finite dimensional vector space, hence there aren't problem with the finitness of the process. The "differences" are two. One: Axler doesn't worry about remove $0$ from the list but only to remove $v_1$ if it is zero. Because during the rest of the process if there is a zero it will be remove from the list automatically. The second difference is the order of remotion. Your proof is not formal. In other word your proof it's not an algorhitm, you don't say "how to do" but "what to do". I don't know if it's clear. To sum up, Axler's process is a process that must end in a finite number of steps and it concretize your idea.