[Math] Outer product reformulation of LU decomposition

algorithmsgaussian eliminationnumerical linear algebraouter productpython

For my numerical analysis class, I wanted to implement the rather simple, in-place LU decomposition algorithm. I did this the naive way and found my code was very, very slow, because I was doing every operation in good old interpreted Python.

So I looked in some books (mainly my trusty Numerical Linear Algebra, by Trefethen and Bau), and found an easy way to speed up the computations, by using NumPy's np.dot routine. However, the algorithm is still very slow, and the book cited above mentions an outer product reformulation of the LU algorithm, which uses only one loop.

The exact statement is the following:

Like most of the algorithms in this book, Gaussian elimination involves a triply nested loop. In Algorithm 20.1, there are two explicit for loops, and the third loop is implicit in the vectors $u_{j, k \colon m}$ and $u_{k,k \colon m}$. Rewrite this algorithm with just one explicit for loop indexed by $k$. Inside this loop, $U$ will be updated at each step by a certain rank-one outer product. This "outer product" form of Gaussian elimination may be a better starting point than Algorithm 20.1 if one wants to optimize computer performance.

Now, I've been searching for this formulation for a while now, and can't seem to find it. The main things it needs to have are

  • it needs to be in-place;
  • it can't use SciPy routines (NumPy is fine however)

The current code I'm using is

def LUfactorize(A):
    lA = len(A)
    for k in range (0, lA-1):
        for j in range(k+1, lA):
            if abs(A[j, k]) > 1e-10:
                A[j, k] /= A[k, k]
                A[j, k+1:lA] -= A[j][k] * A[k, k+1:lA]

Best Answer

You can do

def LUfactorizeB(A):
    lA = len(A)
    for k in range (0, lA-1):
        A[k+1:, k] = A[k+1:, k]/A[k, k]
        A[k+1:, k+1:] -= np.outer(A[k+1:,k],A[k, k+1:])

which vectorizes your lines of code and thus should be faster.

Edit: Here is my time measuring:

import numpy as np
from scipy import random, linalg
import timeit

def LU1(A):
    lA = len(A)
    for k in range(0,lA-1):
        for j in range(k+1, lA):
            if abs(A[j,k])>1e-10:
                A[j,k] /= A[k,k]
                A[j, k+1:] -= A[j][k] * A[k,k+1:lA]

def LU2(A):
    for k in range(0,len(A)-1):
        A[k+1:,k] = A[k+1:,k]/A[k,k]
        A[k+1:,k+1:] -= np.outer(A[k+1:,k],A[k,k+1:])

def tester(matrixSize):
    A = random.rand(matrixSize, matrixSize)
    A = A.T@A
    B = A.copy()
    C = A.copy()
    def test1():
        LU1(A)
    def test2():
        LU2(B)
    # warmup
    LU1(C) 
    LU2(C)
    # measure time
    time1 = timeit.timeit(test1, number=10)/10
    time2 = timeit.timeit(test2, number=10)/10    
    print("dim=%4d, LU1: %.5f, LU2: %.5f, upspeed = %2.2f"%(matrixSize, time1, time2, time1/time2))

testDimensions = [10,20,30,40,50,60,70,80,90,100,200,300,400,500,600,700,800,900,1000]

for k in testDimensions:
    tester(k)

And my measurements:

dim=  10, LU1: 0.00025, LU2: 0.00011, upspeed = 2.31
dim=  20, LU1: 0.00096, LU2: 0.00022, upspeed = 4.27
dim=  30, LU1: 0.00205, LU2: 0.00034, upspeed = 6.09
dim=  40, LU1: 0.00356, LU2: 0.00049, upspeed = 7.32
dim=  50, LU1: 0.00567, LU2: 0.00067, upspeed = 8.47
dim=  60, LU1: 0.00817, LU2: 0.00090, upspeed = 9.07
dim=  70, LU1: 0.01293, LU2: 0.00111, upspeed = 11.65
dim=  80, LU1: 0.01541, LU2: 0.00129, upspeed = 11.97
dim=  90, LU1: 0.01952, LU2: 0.00173, upspeed = 11.30
dim= 100, LU1: 0.02441, LU2: 0.00220, upspeed = 11.07
dim= 200, LU1: 0.09993, LU2: 0.00861, upspeed = 11.60
dim= 300, LU1: 0.23670, LU2: 0.02187, upspeed = 10.82
dim= 400, LU1: 0.38049, LU2: 0.04580, upspeed = 8.31
dim= 500, LU1: 0.52768, LU2: 0.08319, upspeed = 6.34
dim= 600, LU1: 0.75969, LU2: 0.14168, upspeed = 5.36
dim= 700, LU1: 1.02731, LU2: 0.23459, upspeed = 4.38
dim= 800, LU1: 1.34176, LU2: 0.36763, upspeed = 3.65
dim= 900, LU1: 1.68615, LU2: 0.55875, upspeed = 3.02
dim=1000, LU1: 2.08703, LU2: 0.83434, upspeed = 2.50

As you probably notice, the upspeed is better than I thought. :D

Related Question