[Math] Computing Hermite Normal Form using Extended Euclidean Algorithm

algorithmseuclidean-geometry

I am trying to calculate the Hermite Normal Form of a square $n \times n$ matrix using the Extended Euclidean Algorithm to compute the columns of the HNF matrix, rather than the standard (column) matrix operations. According to Cohen [Coh93], this is possible using the following algorithm, where $a_{i,j}$ denotes the coefficients of a matrix $A$, $A_{j}$ denotes column $j$ of the matrix $A$, and $B$ is an auxiliary column vector.

$
\def\indent{\qquad}
k \leftarrow n\\
j \leftarrow n\\
\text{for } i \leftarrow n \text{ to } 1 \text{ do}\\
\text{ compute } (u, v, d) \text{ such that } ua_{i,k} + va_{i,j} = d = \gcd(a_{i,k},a_{i,j}) \text{ with } |u| \text{ and } |v| \text{ minimal (see below)}\\
\text{ while } j \geq 1 \text{ do}\\
\indent\text{ if } a_{i,j} \neq 0 \text{ then}\\
\indent\indent B \leftarrow uA_{k} + vA_{j}\\
\indent\indent A_{j} \leftarrow (a_{i,k}/d)A_{j} – (a_{i,j}/d)A_{k}\\
\indent\indent A_{k} \leftarrow B\\
\indent\text{ end if}\\
\indent\text{ } j \leftarrow j – 1\\
\text{ end while}\\
\text{end for}\\
\vspace{5mm}\\
\text{// Final Reductions}\\
k \leftarrow n\\
\text{for } i \leftarrow n \text{ to } 1 \text{ do}\\
\text{ } b \leftarrow a_{i,k}\\
\text{ if } b < 0 \text{ then}\\
\indent\text{ } A_{k} \leftarrow -A_{k}\\
\indent\text{ } b \leftarrow -b\\
\text{ end if}\\
\text{ if } b = 0 \text{ then}\\
\indent\text{ continue}\\
\text{ end if}\\
\text{ for } j \leftarrow k+1 \text{ to } n \text{ do}\\
\indent\text{ } q \leftarrow \lfloor a_{i,j} / b \rfloor\\
\indent\text{ } A_{j} \leftarrow A_{j} – qA_{k}\\
\text{ end for}\\
\text{ } k \leftarrow k – 1\\
\text{end for}\\
$

Note that the Euclidean step requires that $|u|$ and $|v|$ be minimal. This means that of all the possible unique pairs, $(u,v)$ be chosen such that

$
-\frac{|a|}{d} < v \text{sign}(b) \leq 0 \text{ and } 1 \leq u \text{sign}(a) \leq \frac{|b|}{d}
$

My algorithm for calculating the minimum $|u|$ and $|v|$ therefore consists of the following:

$
d, u, v = \text{XGCD}(a_{i,k},a_{i,j}); \text{// Using XGCD defined by Magma} \\
x_{0} \leftarrow u \\
y_{0} \leftarrow v \\
\text{if } (u = 0) \text{ or } (v = 0) then \\
\text{ for } i \leftarrow 0 \text{ to } d \text{ do} \\
\indent\text{ } u \leftarrow x_{0} – i \times b / d \\
\indent\text{ } v \leftarrow y_{0} + i \times a / d \\
\indent\text{ if } -\frac{|a|}{d} < v \leq 0 \text{ and } 1 \leq u \leq \frac{|b|}{d} \text{ then} \\
\indent\indent\text{ break} \\
\indent\text{ end if} \\
\text{ end for} \\
\text{end if} \\
\text{return } d, u, v \\
$

Here, I have assumed that Magma's XGCD returns the correct sign for $u$ and $v$.

To test this algorithm, I am using the following sample matrix as input to the procedure:

$
\left(
\begin{array}{rrrr}
10 & 0 & 1 & 0 \\
-2 & -2 & 2 & 1 \\
-2 & -1 & 3 & -1 \\
4 & -6 & -2 & 0 \\
\end{array}
\right)
$

I expect the following output from the procedure:

$
\left(
\begin{array}{rrrr}
33 & 12 & 12 & 11 \\
0 & 6 & 5 & 1 \\
0 & 0 & 1 & 0 \\
0 & 0 & 0 & 2 \\
\end{array}
\right)
$

Instead, I am getting the following output:

$
\left(
\begin{array}{rrrr}
3168 & 48 & 48 & 47 \\
0 & 24 & 22 & 18 \\
0 & 0 & 2 & 1 \\
0 & 0 & 0 & 2 \\
\end{array}
\right)
$

It is apparent that there is no problem with the steps that reduce the lower triangular part of the matrix, nor with the steps that reduce the upper triangular part of the matrix (that is, the Final Reduction steps). The problem therefore lies with the Euclidean step.

My question is whether or not I have correctly implemented the procedure that returns the minimal values for $|u|$ and $|v|$? Note that I have used the following link relating to Linear Diophantine Equations as a guideline to implementing this procedure: Euclidean Algorithm. Is this correct, or should I be tackling this problem a different way? Either way, where could I find a better reference that better describes the problem at hand? Any help/advice would be appreciated.

References:

[Coh93] H. Cohen, A Course in Computational Number Theory, Vol. 138 of Graduate Texts in Mathematics. Springer, Heidelberg, 1993.

Best Answer

I think there are some errors, either book errata, or transcription errors, in the stated algorithm. Firstly, $k$ is the pivot column, so it needs to be decremented (whenever nonzero) at the end of the first $i$ loop. Secondly, for each iteration in this loop, $j$ needs to be looped over from $k-1$ down to $1$, to clear out the lower triangular part, so it must be initialized inside the $i$ loop.

I am also skeptical of whether special handling of xgcd is necessary or even helpful, since it always returns a "positive definite" $d$ (an admitted abuse of terminology), in the sense that it is (or should be) always positive given $(a,b)\ne(0,0)$. Because of this, we also do not need the $b < 0$ test (and correction) in the final (partial upper triangular) reductions. In fact, it seems rather artificial to require $v \le 0$ and $u > 0$ of xgcd. What about $(a,b)=(3,1)$ and $(-1,3)$? Also, your pseudocode for the xgcd wrapper only alters its output if $u$ or $v$ is zero, and I think the loop could be avoided by solving some linear equations (taking quotients and a minimum).

However, I don't have Cohen's book, and don't know how or why he requires $|u|$ and $|v|$ to be minimal. Does he mean that $\max\{|u|,|v|\}$ must be minimal? I admit, I haven't thought deeply about it. But I think it is a fact that xgcd already does a good job of producing a minimal solution.

I have corrected the algorithm, as I think it should be, and implemented it in sage. After testing described below, I also made a correction.

m = Matrix(ZZ,4,4,\
    [[10,-2,-2, 4]\
    ,[ 0,-2,-1,-6]\
    ,[ 1, 2, 3,-2]\
    ,[ 0, 1,-1, 0]]).transpose(); m

def hnf(m,show=False):
    nr = m.nrows()
    nc = m.ncols()
    if show:
        print 'hnf: '+str(nr)+'x'+str(nc)
    assert(nr == nc)
    c = m.columns()
    M = Matrix(ZZ,nc,nr,c).transpose()
    if show:
        print M
    def r(n):
        s = range(n)
        s.reverse()
        return s
    k = nc - 1
    pivots = []
    for i in r(nr):
        for j in r(k):
            ck = c[k]; a = ck[i]
            cj = c[j]; b = cj[i]
            if b != 0:
                d,u,v = xgcd(a,b)
                if show:
                    print 'row '+str(i+1)+', cols '+str(j+1)+','+str(k+1)\
                        , 'xgcd:',u,'(',a,') + ',v,'(',b, ') = ',d
                ad = a/d
                bd = b/d
                c[k] = ck*u  + cj*v  # if a != d:
                c[j] = cj*ad - ck*bd
                M = Matrix(ZZ,nc,nr,c).transpose()
                if show:
                    print M
        if c[k][i] != 0:
            pivots.append((i,k))
            if c[k][i] < 0:  # this is necessary for the last pivot column,
                c[k] = -c[k] # since it cannot utilize xgcd to "normalize"
            k = k - 1
    for (i,k) in pivots:
        b = c[k][i]; assert(b > 0)
        if show:
            print 'pivot('+str(i+1)+','+str(k+1)+')='+str(b)
        for j in [k+1..nc-1]:
            q = floor(c[j][i] / b)
            c[j] -= c[k]*q
        M = Matrix(ZZ,nc,nr,c).transpose()
        if show:
            print M
    pivots.reverse()
    #if show:
    #    print 'pivots:', pivots
    return M, pivots

h,p = hnf(m,True)

Here is the output (for your test case):

hnf: 4x4
[10  0  1  0]
[-2 -2  2  1]
[-2 -1  3 -1]
[ 4 -6 -2  0]
row 4, cols 3,4 xgcd: 0 ( 0 ) +  -1 ( -2 ) =  2
[10  0  0 -1]
[-2 -2  1 -2]
[-2 -1 -1 -3]
[ 4 -6  0  2]
row 4, cols 2,4 xgcd: 1 ( 2 ) +  0 ( -6 ) =  2
[ 10  -3   0  -1]
[ -2  -8   1  -2]
[ -2 -10  -1  -3]
[  4   0   0   2]
row 4, cols 1,4 xgcd: 1 ( 2 ) +  0 ( 4 ) =  2
[ 12  -3   0  -1]
[  2  -8   1  -2]
[  4 -10  -1  -3]
[  0   0   0   2]
row 3, cols 2,3 xgcd: -1 ( -1 ) +  0 ( -10 ) =  1
[12  3  0 -1]
[ 2 18 -1 -2]
[ 4  0  1 -3]
[ 0  0  0  2]
row 3, cols 1,3 xgcd: 1 ( 1 ) +  0 ( 4 ) =  1
[12  3  0 -1]
[ 6 18 -1 -2]
[ 0  0  1 -3]
[ 0  0  0  2]
row 2, cols 1,2 xgcd: 0 ( 18 ) +  1 ( 6 ) =  6
[33 12  0 -1]
[ 0  6 -1 -2]
[ 0  0  1 -3]
[ 0  0  0  2]
pivot(4,4)=2
[33 12  0 -1]
[ 0  6 -1 -2]
[ 0  0  1 -3]
[ 0  0  0  2]
pivot(3,3)=1
[33 12  0 -1]
[ 0  6 -1 -5]
[ 0  0  1  0]
[ 0  0  0  2]
pivot(2,2)=6
[33 12 12 11]
[ 0  6  5  1]
[ 0  0  1  0]
[ 0  0  0  2]
pivot(1,1)=33
[33 12 12 11]
[ 0  6  5  1]
[ 0  0  1  0]
[ 0  0  0  2]

Sage also has a hermite_form function for matrices,

var('s')
for s in ['pari', 'pari0', 'pari4', 'padic', 'default', 'ntl']:
    print str(s)
    print m.hermite_form(algorithm=s)

but each implementation produces the matrix below $$ \left(\matrix{ 2 & 0 & 2 & 27 \\ 0 & 1 & 1 & 64 \\ 0 & 0 & 3 & 45 \\ 0 & 0 & 0 & 66 }\right) $$ which, as you point out in a comment, is arrived at by iterating over $i$ increasing, i.e. by pivoting from the top (left) downwards, rather than decreasing from the bottom (right) upwards. We can verify this directly using the transformation $\phi:m\mapsto rm^tr=(rmr)^t$, for $r$ the (orthogonal) matrix $$ r= \left(\begin{array}{rrrr} 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 \end{array}\right), $$ which transposes $m$ by both diagonals. Our (Cohen's) HNF (which pivots from the bottom right upward) and sage's (which forms pivots from the top left downwards) can be obtained from each other by pre- and post- transforming by $\phi:$

r = Matrix(ZZ,4,4,[\
    floor(1/(1+n%3))*\
 (1-floor(1/(1+n%15)))\
    for n in range(16)])
print r
phi = lambda x: (r*x*r).transpose()
print phi(m).hermite_form() == phi(h)
phi(phi(m).hermite_form())

By the way, if you don't mind a little excursion, the group of transformations of square matrices generated under composition by the (self-inverse) maps $\color{red}{t}:m\mapsto m^t$ (transpose) and $\color{blue}{r}:m\mapsto rm$ (row reversal) or $\color{green}{c}:m\mapsto mr$ (column reversal) is in fact isomorphic to the dihedral group $D_4$ of the square. We can visualize this group by its action on a fixed matrix, e.g. the $2\times2$ matrix $e=\left(\matrix{1&2\\3&4}\right)$. The dihedral group of a square as vertices of a cube

Each transformation is represented by, or identified with, (a path from $e$ to) a vertex. The order two elements consist of the surface edges and interior diagonals, while the rotations consist of the face diagonals, and their cosets partition the vertices into two intersecting tetrahedra. This is also a commutative diagram, and thus can be used to verify such formulas as $rc=cr$, $rt=tc$, and $ct=tr$ as well as $\phi=rtc=trc$ from above.

Anyway after some testing

ranks = [0,0,0,0,0]
passed = 0
failed = 0
for n in range(10^2):
    m = random_matrix(ZZ,4)
    h, p = hnf(phi(m))
    if phi(h) == m.hermite_form():
        ranks[len(p)] += 1
        passed += 1
    else:
        print 'Test FAIL!'
        print 'm:'; print m
        print 'm"=phi(m):'; print phi(m)
        print 'hnf(m"):'; print h
        print 'hnf(m")":'; print phi(h)
        print 'm.hnf():'; print m.hermite_form()
        failed += 1
        break
print n+1,'tests performed:',passed,'passed,',failed,'failed.'
print 'rank counts:',ranks

I found that I needed to sometimes correct the sign of the first pivot column, and incorporated this into the hnf function above (along with making the print output optional but off by default). In any case, in sage, you can just define $\phi$ and use phi(phi(m).hermite_form()); after all, why should we add a seventh implementation (except to learn!)?

Related Question