Discrepancies in Custom SVD Implementation Compared to np.linalg.svd – Sign Issues

linear algebrasvdsystems of equations

I've been working on implementing a Singular Value Decomposition (SVD) algorithm from scratch in Python without using the np.linalg.svd function. My goal is to understand the underlying mechanics of SVD. However, I'm encountering some discrepancies in my results compared to the standard np.linalg.svd, specifically regarding the signs of the singular vectors. Interesting enough, my approach works when A is 4×3 but not when it is 3×3 shaped.

My Goal:

Is creating a function that uses a stable and reliable mathematical approach to create an inverse or pseudo inverse to resolve a least squares equation. Limitation – I am not allowed to use any existing math functions. I will later on replace np.linalg.norm and other stuff, by my own functions.

I tried to implement the Moore Penrose inverse, but there were to many different exceptions and issues in regards to the matrix properties. According to some papers, the usage of SVD allows an easy way out.

My Approach:

I've implemented a custom QR decomposition (my_qr) and an eigenvalue/eigenvector computation (eig_custom) method, which are then utilized in calculating the SVD. Here's a simplified version of my code:

import numpy as np

def my_qr(A, standard=False):
    m, n = A.shape    
    Q = np.eye(m)
    R = np.copy(A)
    if m == n: end = n-1
    else: end = n
    for i in range(0, end):
        H = np.eye(m)

        a = R[i:, i]    # partial column vector
        norm_a = np.linalg.norm(a)    # 
        if a[0] < 0.0: norm_a = -norm_a
        v = a / (a[0] + norm_a)
        v[0] = 1.0
        h = np.eye(len(a))    # H reflection
        h -= (2 / np.dot(v, v)) * \
            np.matmul(v[:,None], v[None,:])

        H[i:, i:] = h 
        Q = np.matmul(Q, H)
        R = np.matmul(H, R)

    if standard == True:    
        S = np.zeros((n,n)) 
        for i in range(n):
            if R[i][i] < 0.0: S[i][i] = -1.0
            else: S[i][i] = +1.0
        Q = np.matmul(Q, S)
        R = np.matmul(S, R)
    
    return Q, R

def is_upper_tri(A, tol):
    n = len(A)
    for i in range(0,n):
        for j in range(0,i): 
            if np.abs(A[i][j]) > tol:
                return False
    return True

def eig_custom(A, num_iterations: int = 100):
    Ak = np.copy(A)
    n = A.shape[0]
    eigenvectors = np.eye(n)

    for cnt in range(num_iterations):
        Q, R = my_qr(Ak) #my_qr
        Ak = R @ Q

        eigenvectors = eigenvectors @ Q
        if is_upper_tri(Ak, 1.0e-16) == True:
            break
            
    eigenvalues = np.diag(Ak)
    return eigenvalues, eigenvectors


def calculU(M): 
    B = np.dot(M, M.T) 

    eigenvalues, eigenvectors = eig_custom(B) 
    ncols = np.argsort(eigenvalues)[::-1] 
    return eigenvectors[:,ncols] 

def calculVt(M): 
    B = np.dot(M.T, M)
        
    eigenvalues, eigenvectors = eig_custom(B) 
    ncols = np.argsort(eigenvalues)[::-1] 
    
    return eigenvectors[:,ncols].T 

def calculSigma(M): 
    if (np.size(np.dot(M, M.T)) > np.size(np.dot(M.T, M))): 
        newM = np.dot(M.T, M) 
    else: 
        newM = np.dot(M, M.T) 
        
    eigenvalues, eigenvectors = eig_custom(newM) 
    eigenvalues = np.sqrt(np.abs(eigenvalues)) 
    #Sorting in descending order as the svd function does 
    return eigenvalues[::-1] 

def svd(A, num_iterations=1000):
    # Compute eigenvectors and eigenvalues for A^TA and AA^T
    eigvals_AAt, eigvecs_AAt = eig_custom(A @ A.T, num_iterations)
    eigvals_AAt_ref, eigvecs_AAt_ref = np.linalg.eig(A @ A.T)

    eigvals_AtA, eigvecs_AtA = eig_custom(A.T @ A, num_iterations)
    eigvals_AtA_ref, eigvecs_AtA_ref = np.linalg.eig(A.T @ A)

    singular_values = np.sqrt(np.abs(eigvals_AtA))

    sorted_indices = np.argsort(singular_values)[::-1]
    singular_values = singular_values[sorted_indices]
    U = eigvecs_AAt[:, sorted_indices]
    V = eigvecs_AtA[:, sorted_indices]

    return U, singular_values, V.T

def main():
    mat = []
    for i in range(3):
        mat.append(np.array([7.0, 6.0, 5.0]) - i)
    A = np.array(mat, dtype=float)
    #print(A)
    b = np.array([10,-3, 7])
    x = A @ b
    print(f"Reference Matrix:\n{A}")
    
    print(f"\n{'-'*50}\nReference SVD decomp:")
    U_ref, S_ref, V_ref = np.linalg.svd(A.copy(), full_matrices=False)
    print(f"\n\nU:\n{U_ref}\nS:\n{S_ref}\nV:{V_ref}\n\n")
    print("Verification [U @ Sigma @ V]:")
    print(U_ref @ np.diag(S_ref) @ V_ref)

    print(f"\n{'-'*50}\nMy SVD decomp:")
    
    U, S, V = svd(A.copy())
    print(f"\nU:\n{U}\n\nS:\n{S}\n\nV:{V}\n\n")
    print("Verification [U @ Sigma @ V]:")
    print(U @ np.diag(S) @ V)
    
    print(f"\n{'-'*50}\nDifferent custom SVD decomp:")
    U, S, V = calculU(A), calculSigma(A), calculVt(A) 
    print(f"\nU:\n{U}\n\nS:\n{S}\n\nV:{V}\n\n")
    newSigma = np.zeros(A.T.shape)
    newSigma[:S.size, :S.size] = np.diag(S[::-1])
    A_res = U @ newSigma.T @ V
    print("Verification [U @ Sigma @ V]:")
    print(A_res)

if __name__ == "__main__":
    main()

The complete code, along with the custom QR decomposition and eigenvalue computation, can be found attached.

Issue Encountered:

While my custom SVD implementation does produce singular values that are close to those generated by np.linalg.svd, the singular vectors (U and V matrices) often have different signs. This leads to a final reconstructed matrix that differs from the original.

The reconstructed matrix from my custom SVD differs from the original, while np.linalg.svd perfectly reconstructs it.

Reference Matrix:
[[7. 6. 5.]
 [6. 5. 4.]
 [5. 4. 3.]]

--------------------------------------------------
Reference SVD decomp:


U:
[[-0.6813193   0.60756674  0.40824829]
 [-0.57017342 -0.09075023 -0.81649658]
 [-0.45902754 -0.7890672   0.40824829]]
S:
[1.53898669e+01 3.89866919e-01 8.98295192e-16]
V:[[-0.6813193  -0.57017342 -0.45902754]
 [-0.60756674  0.09075023  0.7890672 ]
 [ 0.40824829 -0.81649658  0.40824829]]


Verification [U @ Sigma @ V]:
[[7. 6. 5.]
 [6. 5. 4.]
 [5. 4. 3.]]

--------------------------------------------------
My SVD decomp:

U:
[[ 0.6813193  -0.60756674  0.40824829]
 [ 0.57017342  0.09075023 -0.81649658]
 [ 0.45902754  0.7890672   0.40824829]]

S:
[1.53898669e+01 3.89866919e-01 2.26925653e-08]

V:[[ 0.6813193   0.57017342  0.45902754]
 [-0.60756674  0.09075023  0.7890672 ]
 [ 0.40824829 -0.81649658  0.40824829]]


Verification [U @ Sigma @ V]:
[[7.28782888 5.95700795 4.62618703]
 [5.95700795 5.00642159 4.0558352 ]
 [4.62618703 4.0558352  3.48548338]]

--------------------------------------------------
Different custom SVD decomp:

U:
[[ 0.6813193  -0.60756674  0.40824829]
 [ 0.57017342  0.09075023 -0.81649658]
 [ 0.45902754  0.7890672   0.40824829]]

S:
[2.26925653e-08 3.89866919e-01 1.53898669e+01]

V:[[ 0.6813193   0.57017342  0.45902754]
 [-0.60756674  0.09075023  0.7890672 ]
 [ 0.40824829 -0.81649658  0.40824829]]


Verification [U @ Sigma @ V]:
[[7.28782888 5.95700795 4.62618703]
 [5.95700795 5.00642159 4.0558352 ]
 [4.62618703 4.0558352  3.48548338]]

Questions:

  1. Could the discrepancies in the signs of the singular vectors be the
    reason for the imperfect reconstruction of the original matrix?
  2. Is there a specific step in the QR decomposition or the eigenvalue
    computation where I should pay extra attention to the signs?
  3. Are there any known best practices or adjustments to ensure the signs
    align more closely with those produced by standard libraries like
    NumPy?

Any insights or suggestions on how to resolve these discrepancies and improve the accuracy of my custom SVD implementation would be greatly appreciated!

Thank you!

Best Answer

The problem with your code is that you are calculating $U$ and $V$ separately rather than calculating one and choosing the other to be compatible. As I explain in detail here, the fact that the columns of $U$ are eigenvectors of $A^TA$ and that the columns of $V$ are eigenvectors of $AA^T$ is not enough to ensure that $A = U \Sigma V^T$ is a singular value decomposition. Once you have selected $U$ and $\Sigma$, $V$ needs to be chosen so that $$ A^TU = V\Sigma^T. $$ For any non-zero singular value $\sigma_j$, this means that the corresponding column of $V$ must satisfy $$ v_j = \frac 1{\sigma_j}A^Tu_j. $$ the remaining columns of $V$ should be selected so that the columns of $V$ form an orthonormal basis of $\Bbb R^n$.