I am also new to the subject of lattices, so you should take my answer with a grain of salt. Also, I have mainly read the case of the Euclidean norm. So in the remaining, the norm will always be the euclidean norm. The algorithm is also called Lagrange or Lagrange-Gauss algorithm, see here. Also, there are certainly more elegent proofs.
The algorithm can be viewed as an approximation of the Gram–Schmidt orthogonalization process. Denote
$$
\mu = \frac{<v_1,v_2>}{<v_1,v_1>} \ ,
$$
then $\mu v_1$ is the orthogonal projection of $v_2$ into $v_1$. The vector $v_2-\mu v_1$ is thus orthogonal to $v_1$ and, when $\mu \neq 0$ (that is, when $v_1$ and $v_2$ are not already orthogonal), $|| v_2 - \mu v_1 || < || v_2 ||$ since $v_1$ and $v_2$ are linearly independent. $(v_1, v_2-\mu v_1)$ is an orthogonal (but not necessarily orthonormal) basis of $\mathbb{R}^2$. For lattices, an orthogonal basis may not exists. The aim is then to find a basis which is "as orthogonal as possible". An approximation is then to round $\mu$ and to take as a new basis $(v_1, v_2-mv_1)$ for the lattice and repeat the process until no improvement is found. There is a measure of orthogonality of a lattice basis called the orthogonality defect defined as
$$
\delta(v_1, v_2) = \frac{|| v_1 || \times || v_2 ||}{|\det(v_1,v_2)|}
$$
which links the "orthogonality" of a basis to the length of its vectors: $(v_1,v_2)$ is an orthogonal basis if and only if $\delta(v_1,v_2)=1$. Intuitively, it means that the "more" orthogonal the basis is, the shorter the vector of the basis are.
That being said, I am not totally clear on the link between the orthogonality defect and the shortest vector. It seems to me that there is a well-known (and thus never spelled out!) result that would be of the form "for a given lattice, if a basis is of minimal orthogonality defect, then it contains a shortest vector". I am even less clear if the orthogonality defect can actually be used as a measure to rank two bases with respect to the shortest vector problem, or any other problem by the way. If there is no result of the form "if the orthogonality defect of a first basis is smaller than the orthogonality defect of a second basis, then the shortest vector in the first basis is smaller than the shortest vector in the second basis" or something similar, the orthogonality defect would be a measure only in name, at least for me.
In any case.
Sufficiency of ($\star$)
First note that the algorithm never produces $v_1(k)$ and $v_2(k)$ colinear, where $k$ is the iteration. Indeed you start with $v_1(k)$ and $v_2(k)$ non colinear. The algorithm then produces $v_2(k+1)=v_2(k)−m(k)v_1(k)$. Assumming that $v_2(k+1)$ is colinear to $v_1(k)$ means $v_2(k+1) = \alpha(k) v_1(k)$ for some $\alpha(k)$. This implies $(\alpha(k)+m(k)) v_1(k) = v_2(k)$ which is not possible. The proof for $v_2(k+1)$ non colinear to $v_2(k)$ is similar. In particular, this means that $v_1$ and $v_2$ are never the zero vector. One can use unimodular matrix to prove the non colinearity. Actually, $(v_1(k),v_2(k))$ is always a basis of the lattice. It is easy to see that, for a lattice vector $v = c_1v_1(k)+c_2v_2(k)$, it is always possible to find integers $d_1$ and $d_2$ such that $v = d_1v_1(k)+d_2(v_2(k)-m(k)v_1(k))$: $c_1 = d_1-d_2m(k)$ and $c_2=d_2$. You may also see that $(v_1,v_2-mv_1)$ with $m$ integer is an unimodular transformation of $(v_1,v_2)$.
It was shown that for any lattice vector $v=c_1v_1+c_2v_2$, for some $(c_1,c_2)$, one has
$$||v||^2≥\underbrace{(c_1^2−|c_1c_2|+c_2^2)}_{s}||v_1||^2 \ .$$
Since it is true for any vector, it is true for a shortest (non zero) vector of the lattice that I denote $v^*$. Thus $||v^*||^2≥s||v_1||^2$. Since $v^*$ is a non zero vectors, $(c_1,c_2)\neq (0,0)$. Assuming that $(c_1,c_2)\neq (0,0) \ \Rightarrow \ s>0$, one has $||v^*||^2 \geq s ||v_1||^2$ for some integer $s \geq 1$. But since $v^*$ is a shortest vector, this implies that $v_1$ is also a shortest vector since $v_1$ is not the zero vector. In that case, $s$ must be 1.
It is shown here that the algorithm also produces a second shortest vector.
Termination
It can be shown that the sequence of $v_1(k)$ and $v_2(k)$ are non-increasing in norm, where $k$ is the iteration. First assume that $||v_1(k)|| < ||v_2(k)||$ so that no swap occurs. Then,
\begin{align}
|| v_2(k+1) ||^2 & = || v_2(k)-m(k)v_1(k)||^2 \\
& = ||v_2(k)-\mu(k)v_1(k)+(\mu(k)-m(k))v_1(k)||^2 \\
& \leq ||v_2(k)-\mu(k)v_1(k)||^2+||(\mu(k)-m(k))v_1(k)||^2 & \text{triangular inequality} \\
& = ||v_2(k)||^2-||\mu(k)v_1(k)||^2+||(\mu(k)-m(k))v_1(k)||^2 & \text{orthogonality} \\
& = ||v_2(k)||^2-\underbrace{\left[ \mu^2(k) - (\mu(k)-m(k))^2 \right]}_{f(k)}||v_1(k)||^2 \\
\end{align}
Note that $|\mu(k)-m(k)|\leq 1/2$ since $m(k)$ is the rounding of $\mu(k)$. If $|\mu(k)| \geq 1/2$, clearly $f(k) \geq 0$ so that $||v_2(k+1)||^2 \leq ||v_2(k)||^2$. If $|\mu(k)| < 1/2$, $m(k) = 0$, we still have $||v_2(k+1)||^2 \leq ||v_2(k)||^2$ (and the algorithm terminates). The sequences $||v_1(k)||$ and $||v_2(k)||$ are thus non decreasing sequences bounded below by 0: they converge. Taking into account the swapping of $v_1$ and $v_2$ is essentially the same.
Here the convergence is in norm. By itself, it does not tell too much about the convergence of the vectors. For instance, the vectors could bounce back and forth between two vectors of the same norm. To prove the convergence of the vector themselves, we need something a little bit stronger. Actually, there are 3 cases to consider:
- $|\mu(k)| > 1/2$, then clearly $f(k) > 0$ and $||v_2(k+1)||^2 < ||v(k)||^2$;
- $-1/2 < \mu(k) \leq 1/2$, then $m(k) = 0$ due to the tie breaking rule used for the rounding. The algorithm then stops;
- $\mu(k) = -1/2$, then $m(k) = -1$, $v_2(k+1) = v_2(k)+v_1(k)$ and either $||v_2(k+1)|| < ||v_2(k)||$ or $||v_2(k+1)|| = ||v_2(k)||$. If $||v_2(k+1)|| = ||v_2(k)||$, the algorithm stops at the next iteration. Indeed, since $||v_2(k+1)|| = ||v_2(k)||$, there is no swap and $v_1(k+1)=v_1(k)$. Then
$$
\mu(k+1)= \frac{<v_1(k+1),v_2(k+1)>}{<v_1(k+1),v_1(k+1)>} = \frac{<v_1(k),v_2(k)+v_1(k)>}{<v_1(k),v_1(k)>} = -1/2 + 1 = 1/2
$$
which make the algorithm stops since $m(k+1) = 0$.
In summary, at iteration $k$, the algorithm actually decreases the norm of, at least, one of the two vectors (if there is a swap, then $v_1$ is decreased; if no swap, then $v_2$ is decreased) or stops at the current iteration or at the next iteration. It remains to show that the norm of the vectors cannot be decreased indefinitely.
The length of the shortest vector is also called the first Minkowski successive minima. For any basis $(v_1,v_2)$ of the lattice $L$, this length can be bounded below by
$$
\lambda_1(L) \geq \min \ \{ ||\tilde{v}_1||, ||\tilde{v}_2|| \}
$$
where $(||\tilde{v}_1||, ||\tilde{v}_2||)$ are the vectors produced by the Gram-Schmidt orthogonalization process. That is,
$$
\tilde{v}_1 = v_1 \ \text{ and } \ \tilde{v}_2 = v_2 - \frac{<v_1,v_2>}{<v_1,v_1>}v_1 \ .
$$
The proof is actually quite easy and can be found here. From this, it is possible to show that there are only a finite number of vectors to consider in the algorithm. Since the algorithm decreases the length of the vectors, only the vectors inside $\mathcal{B}(0,r)$ the circle centered at 0 and of radius $r = \max \ \{ ||v_1(0)||, ||v_2(0)|| \}$ need to be considered. Those vectors are in finite number. The idea is to use squares of size sufficiently small so that at most one point of the lattice is inside a square and to pad these squares so that it includes $\mathcal{B}(0,r)$. Denote $l = \min \ \{ ||\tilde{v}_1||, ||\tilde{v}_2|| \}$. For instance, let $\epsilon = l/4$, the maximum length inside the square of length $\epsilon$ is $\sqrt{2}\epsilon = (\sqrt{2}/4) l < l$ so that any square can only contain at most one lattice point. To cover the ball $\mathcal{B}(0,r)$, on needs $\lfloor r/\epsilon \rceil + 1$ ($+1$ to be safe about the rounding convention) for the $x$-axis and the same number for the $y$-axis. So the maximum of lattice point inside $\mathcal{B}(0,r)$ is bounded above by $(\lfloor r/\epsilon \rceil + 1)^2$. The algorithm thus stops in a finite number of iterations.
Best Answer
Yes. Just stack the vectors on a matrix $M$ and notice that $det(M) = x_1y_2 - x_2y_1$. So, if $x_1y_2 = x_2y_1$ we have $det(M) = 0$ and then the vectors are linearly dependent.