Proof of correctness of SVP algorithm for dimension 2

algorithmsinteger-latticeslinear algebra

The shortest vector problem (SVP – $p$) consists of finding the shortest vector in a lattice, using the norm $|\!|.|\!|_p$.

For lattices of dimension 2 and using the euclidean norm, our professor described an algorithm that he called "Gauss's algorithm", of which I could find no mention by googling. The algorithm follows:

Given a basis $B = \{v_1, v_2\}$, swap the vectors in the base so that $|\!|v_2|\!|> |\!|v_1|\!|$
Let $m = \bigg\lfloor \dfrac{\langle v_1, v_2 \rangle}{\langle v_1, v_1 \rangle}\bigg\rceil$, where $\lfloor n + 1/k \rceil = \begin{cases} n&\text{if } k\geq 2\\n+1 & \text{otherwise} \end{cases}$
If $m = 0$, the vector $v_1$ is returned, and is the shortest vector in the lattice.
Otherwise, an approximate step of the Gram-Schmidt process is performed, using $mv_1$ as the approximate projection of $v_2$ on $v_1$: $ v^\star_2 = v_2 – mv_1$
The algorithm is called recursively on the new basis $\{v_1, v^\star_2\}$

The intuition behind this should be that at each step the length of the basis vectors is reduced. The notes then go on to state this lemma of correctness:

Lemma Let $\{ v_1, v_2 \}$ be the result of the algorithm. Then $v_1$ is a shortest vector in the lattice.

Of which I shall summarize the proof given in the notes:

Since the algorithm has stopped, we have reached a base case, hence:

$$ \bigg\lfloor \dfrac{\langle v_1, v_2 \rangle}{\langle v_1, v_1 \rangle}\bigg\rceil = 0 \implies \bigg| \dfrac{\langle v_1, v_2 \rangle}{\langle v_1, v_1 \rangle} \bigg| \leq \dfrac{1}{2} $$

Additionally, we know that $|\!|v_1|\!| \leq |\!|v_2|\!|$
Let $v = c_1v_1 + c_2 v_2$. Using the properties above and the triangle inequality, it is simple to show (and is shown in detail in the notes, but these details are not relevant) that:
$$ |\!| v |\!|^2 \geq (c_1^2 – |c_1c_2| + c_2^2) |\!| v_1 |\!|^2 $$
The quantity $c_1^2 – |c_1c_2| + c_2^2$ is an integer (as the coefficients are, by definition of a lattice)
The notes then state that: ($\star$)

It is therefore sufficient to prove that
$$(c_1, c_2) \neq (0,0) \implies c_1^2 – |c_1c_2| + c_2^2 > 0$$

And then leaves the proof to the reader as an exercise.

Things I did not understand

Shouldn't there also be a proof of termination?
Why is it sufficient to show $(\star)$?
Is the proof making any assumptions, e.g. are we assuming that the vector length does reduce at each iteration? If so, what is a proof of such statement, and how does it relate to the proof from the notes?

Best Answer

I am also new to the subject of lattices, so you should take my answer with a grain of salt. Also, I have mainly read the case of the Euclidean norm. So in the remaining, the norm will always be the euclidean norm. The algorithm is also called Lagrange or Lagrange-Gauss algorithm, see here. Also, there are certainly more elegent proofs.

The algorithm can be viewed as an approximation of the Gram–Schmidt orthogonalization process. Denote $$ \mu = \frac{<v_1,v_2>}{<v_1,v_1>} \ , $$ then $\mu v_1$ is the orthogonal projection of $v_2$ into $v_1$. The vector $v_2-\mu v_1$ is thus orthogonal to $v_1$ and, when $\mu \neq 0$ (that is, when $v_1$ and $v_2$ are not already orthogonal), $|| v_2 - \mu v_1 || < || v_2 ||$ since $v_1$ and $v_2$ are linearly independent. $(v_1, v_2-\mu v_1)$ is an orthogonal (but not necessarily orthonormal) basis of $\mathbb{R}^2$. For lattices, an orthogonal basis may not exists. The aim is then to find a basis which is "as orthogonal as possible". An approximation is then to round $\mu$ and to take as a new basis $(v_1, v_2-mv_1)$ for the lattice and repeat the process until no improvement is found. There is a measure of orthogonality of a lattice basis called the orthogonality defect defined as $$ \delta(v_1, v_2) = \frac{|| v_1 || \times || v_2 ||}{|\det(v_1,v_2)|} $$ which links the "orthogonality" of a basis to the length of its vectors: $(v_1,v_2)$ is an orthogonal basis if and only if $\delta(v_1,v_2)=1$. Intuitively, it means that the "more" orthogonal the basis is, the shorter the vector of the basis are.

That being said, I am not totally clear on the link between the orthogonality defect and the shortest vector. It seems to me that there is a well-known (and thus never spelled out!) result that would be of the form "for a given lattice, if a basis is of minimal orthogonality defect, then it contains a shortest vector". I am even less clear if the orthogonality defect can actually be used as a measure to rank two bases with respect to the shortest vector problem, or any other problem by the way. If there is no result of the form "if the orthogonality defect of a first basis is smaller than the orthogonality defect of a second basis, then the shortest vector in the first basis is smaller than the shortest vector in the second basis" or something similar, the orthogonality defect would be a measure only in name, at least for me.

In any case.

Sufficiency of ($\star$) First note that the algorithm never produces $v_1(k)$ and $v_2(k)$ colinear, where $k$ is the iteration. Indeed you start with $v_1(k)$ and $v_2(k)$ non colinear. The algorithm then produces $v_2(k+1)=v_2(k)−m(k)v_1(k)$. Assumming that $v_2(k+1)$ is colinear to $v_1(k)$ means $v_2(k+1) = \alpha(k) v_1(k)$ for some $\alpha(k)$. This implies $(\alpha(k)+m(k)) v_1(k) = v_2(k)$ which is not possible. The proof for $v_2(k+1)$ non colinear to $v_2(k)$ is similar. In particular, this means that $v_1$ and $v_2$ are never the zero vector. One can use unimodular matrix to prove the non colinearity. Actually, $(v_1(k),v_2(k))$ is always a basis of the lattice. It is easy to see that, for a lattice vector $v = c_1v_1(k)+c_2v_2(k)$, it is always possible to find integers $d_1$ and $d_2$ such that $v = d_1v_1(k)+d_2(v_2(k)-m(k)v_1(k))$: $c_1 = d_1-d_2m(k)$ and $c_2=d_2$. You may also see that $(v_1,v_2-mv_1)$ with $m$ integer is an unimodular transformation of $(v_1,v_2)$.

It was shown that for any lattice vector $v=c_1v_1+c_2v_2$, for some $(c_1,c_2)$, one has $$||v||^2≥\underbrace{(c_1^2−|c_1c_2|+c_2^2)}_{s}||v_1||^2 \ .$$ Since it is true for any vector, it is true for a shortest (non zero) vector of the lattice that I denote $v^*$. Thus $||v^*||^2≥s||v_1||^2$. Since $v^*$ is a non zero vectors, $(c_1,c_2)\neq (0,0)$. Assuming that $(c_1,c_2)\neq (0,0) \ \Rightarrow \ s>0$, one has $||v^*||^2 \geq s ||v_1||^2$ for some integer $s \geq 1$. But since $v^*$ is a shortest vector, this implies that $v_1$ is also a shortest vector since $v_1$ is not the zero vector. In that case, $s$ must be 1.

It is shown here that the algorithm also produces a second shortest vector.

Termination It can be shown that the sequence of $v_1(k)$ and $v_2(k)$ are non-increasing in norm, where $k$ is the iteration. First assume that $||v_1(k)|| < ||v_2(k)||$ so that no swap occurs. Then, \begin{align} || v_2(k+1) ||^2 & = || v_2(k)-m(k)v_1(k)||^2 \\ & = ||v_2(k)-\mu(k)v_1(k)+(\mu(k)-m(k))v_1(k)||^2 \\ & \leq ||v_2(k)-\mu(k)v_1(k)||^2+||(\mu(k)-m(k))v_1(k)||^2 & \text{triangular inequality} \\ & = ||v_2(k)||^2-||\mu(k)v_1(k)||^2+||(\mu(k)-m(k))v_1(k)||^2 & \text{orthogonality} \\ & = ||v_2(k)||^2-\underbrace{\left[ \mu^2(k) - (\mu(k)-m(k))^2 \right]}_{f(k)}||v_1(k)||^2 \\ \end{align} Note that $|\mu(k)-m(k)|\leq 1/2$ since $m(k)$ is the rounding of $\mu(k)$. If $|\mu(k)| \geq 1/2$, clearly $f(k) \geq 0$ so that $||v_2(k+1)||^2 \leq ||v_2(k)||^2$. If $|\mu(k)| < 1/2$, $m(k) = 0$, we still have $||v_2(k+1)||^2 \leq ||v_2(k)||^2$ (and the algorithm terminates). The sequences $||v_1(k)||$ and $||v_2(k)||$ are thus non decreasing sequences bounded below by 0: they converge. Taking into account the swapping of $v_1$ and $v_2$ is essentially the same.

Here the convergence is in norm. By itself, it does not tell too much about the convergence of the vectors. For instance, the vectors could bounce back and forth between two vectors of the same norm. To prove the convergence of the vector themselves, we need something a little bit stronger. Actually, there are 3 cases to consider:

$|\mu(k)| > 1/2$, then clearly $f(k) > 0$ and $||v_2(k+1)||^2 < ||v(k)||^2$;
$-1/2 < \mu(k) \leq 1/2$, then $m(k) = 0$ due to the tie breaking rule used for the rounding. The algorithm then stops;
$\mu(k) = -1/2$, then $m(k) = -1$, $v_2(k+1) = v_2(k)+v_1(k)$ and either $||v_2(k+1)|| < ||v_2(k)||$ or $||v_2(k+1)|| = ||v_2(k)||$. If $||v_2(k+1)|| = ||v_2(k)||$, the algorithm stops at the next iteration. Indeed, since $||v_2(k+1)|| = ||v_2(k)||$, there is no swap and $v_1(k+1)=v_1(k)$. Then $$ \mu(k+1)= \frac{<v_1(k+1),v_2(k+1)>}{<v_1(k+1),v_1(k+1)>} = \frac{<v_1(k),v_2(k)+v_1(k)>}{<v_1(k),v_1(k)>} = -1/2 + 1 = 1/2 $$ which make the algorithm stops since $m(k+1) = 0$.

In summary, at iteration $k$, the algorithm actually decreases the norm of, at least, one of the two vectors (if there is a swap, then $v_1$ is decreased; if no swap, then $v_2$ is decreased) or stops at the current iteration or at the next iteration. It remains to show that the norm of the vectors cannot be decreased indefinitely.

The length of the shortest vector is also called the first Minkowski successive minima. For any basis $(v_1,v_2)$ of the lattice $L$, this length can be bounded below by $$ \lambda_1(L) \geq \min \ \{ ||\tilde{v}_1||, ||\tilde{v}_2|| \} $$ where $(||\tilde{v}_1||, ||\tilde{v}_2||)$ are the vectors produced by the Gram-Schmidt orthogonalization process. That is, $$ \tilde{v}_1 = v_1 \ \text{ and } \ \tilde{v}_2 = v_2 - \frac{<v_1,v_2>}{<v_1,v_1>}v_1 \ . $$ The proof is actually quite easy and can be found here. From this, it is possible to show that there are only a finite number of vectors to consider in the algorithm. Since the algorithm decreases the length of the vectors, only the vectors inside $\mathcal{B}(0,r)$ the circle centered at 0 and of radius $r = \max \ \{ ||v_1(0)||, ||v_2(0)|| \}$ need to be considered. Those vectors are in finite number. The idea is to use squares of size sufficiently small so that at most one point of the lattice is inside a square and to pad these squares so that it includes $\mathcal{B}(0,r)$. Denote $l = \min \ \{ ||\tilde{v}_1||, ||\tilde{v}_2|| \}$. For instance, let $\epsilon = l/4$, the maximum length inside the square of length $\epsilon$ is $\sqrt{2}\epsilon = (\sqrt{2}/4) l < l$ so that any square can only contain at most one lattice point. To cover the ball $\mathcal{B}(0,r)$, on needs $\lfloor r/\epsilon \rceil + 1$ ($+1$ to be safe about the rounding convention) for the $x$-axis and the same number for the $y$-axis. So the maximum of lattice point inside $\mathcal{B}(0,r)$ is bounded above by $(\lfloor r/\epsilon \rceil + 1)^2$. The algorithm thus stops in a finite number of iterations.

Best Answer

Related Solutions

How to Show Vectors Span a Vector Space by Definition

Interpretation of a proof on inner product space.

Related Question