Short answer: Method 2 is better for small angles. Use this slight rearrangement:
$$
\theta = 2~atan2(||~||v||u - ||u||v~||,~||~||v||u + ||u||v~||)
$$
This formula comes from W. Kahan's advice in his paper "How Futile are Mindless Assessments of Roundoff in Floating-Point Computation?" (https://www.cs.berkeley.edu/~wkahan/Mindless.pdf), section 12 "Mangled Angles."
Before I found that paper, I also did my own analysis, comparing the double-precision results with result's using bc with 50 digits of precision. Method 2 is commonly within 1 ulp of correct, and almost always within 50 ulps, whereas Method 1 was much, much less accurate.
This might seem surprising, since Method 1 is mathematically insensitive to the magnitude of u and v, whereas they have to be normalized in Method 2. And indeed, the accuracy of the normalization is the limiting factor for Method 2 - virtually all the error comes from the fact that even after normalization, the vectors aren't exactly length 1.
However, for small angles you get catastrophic cancellation in the cross product for method 1. Specifically, all the products like $u_y v_z - u_z v_y$ end up close to 0, and I believe the cross-multiplication before the subtraction loses accuracy, compared to the direct subtraction in Method 2.
You can do
def LUfactorizeB(A):
lA = len(A)
for k in range (0, lA-1):
A[k+1:, k] = A[k+1:, k]/A[k, k]
A[k+1:, k+1:] -= np.outer(A[k+1:,k],A[k, k+1:])
which vectorizes your lines of code and thus should be faster.
Edit: Here is my time measuring:
import numpy as np
from scipy import random, linalg
import timeit
def LU1(A):
lA = len(A)
for k in range(0,lA-1):
for j in range(k+1, lA):
if abs(A[j,k])>1e-10:
A[j,k] /= A[k,k]
A[j, k+1:] -= A[j][k] * A[k,k+1:lA]
def LU2(A):
for k in range(0,len(A)-1):
A[k+1:,k] = A[k+1:,k]/A[k,k]
A[k+1:,k+1:] -= np.outer(A[k+1:,k],A[k,k+1:])
def tester(matrixSize):
A = random.rand(matrixSize, matrixSize)
A = A.T@A
B = A.copy()
C = A.copy()
def test1():
LU1(A)
def test2():
LU2(B)
# warmup
LU1(C)
LU2(C)
# measure time
time1 = timeit.timeit(test1, number=10)/10
time2 = timeit.timeit(test2, number=10)/10
print("dim=%4d, LU1: %.5f, LU2: %.5f, upspeed = %2.2f"%(matrixSize, time1, time2, time1/time2))
testDimensions = [10,20,30,40,50,60,70,80,90,100,200,300,400,500,600,700,800,900,1000]
for k in testDimensions:
tester(k)
And my measurements:
dim= 10, LU1: 0.00025, LU2: 0.00011, upspeed = 2.31
dim= 20, LU1: 0.00096, LU2: 0.00022, upspeed = 4.27
dim= 30, LU1: 0.00205, LU2: 0.00034, upspeed = 6.09
dim= 40, LU1: 0.00356, LU2: 0.00049, upspeed = 7.32
dim= 50, LU1: 0.00567, LU2: 0.00067, upspeed = 8.47
dim= 60, LU1: 0.00817, LU2: 0.00090, upspeed = 9.07
dim= 70, LU1: 0.01293, LU2: 0.00111, upspeed = 11.65
dim= 80, LU1: 0.01541, LU2: 0.00129, upspeed = 11.97
dim= 90, LU1: 0.01952, LU2: 0.00173, upspeed = 11.30
dim= 100, LU1: 0.02441, LU2: 0.00220, upspeed = 11.07
dim= 200, LU1: 0.09993, LU2: 0.00861, upspeed = 11.60
dim= 300, LU1: 0.23670, LU2: 0.02187, upspeed = 10.82
dim= 400, LU1: 0.38049, LU2: 0.04580, upspeed = 8.31
dim= 500, LU1: 0.52768, LU2: 0.08319, upspeed = 6.34
dim= 600, LU1: 0.75969, LU2: 0.14168, upspeed = 5.36
dim= 700, LU1: 1.02731, LU2: 0.23459, upspeed = 4.38
dim= 800, LU1: 1.34176, LU2: 0.36763, upspeed = 3.65
dim= 900, LU1: 1.68615, LU2: 0.55875, upspeed = 3.02
dim=1000, LU1: 2.08703, LU2: 0.83434, upspeed = 2.50
As you probably notice, the upspeed is better than I thought. :D
Best Answer
Let me first give an intuitive description of numerical instability: it is the possibility for round-off errors to get so amplified as to contaminate the results.
Numerical stability is a guarantee that there will be no numerical instability.
Here is a lecture on round-off errors from a course on numerical linear algebra. It uses the concept of a condition number.
If these are insufficient, you will need the involvement of a person who can analyze your algorithm for numerical stability--make them co-author?