[Math] Trying to Check Cov Matrix calculation from SVD using Numpy

linear algebrapythonsvd

I am trying to work with the SVD and PCA. Just to check that I am doing what I think I am doing, I did a simple test in in python. The test is that I make a random matrix of realizations, and I construct the covariance matrix using the SVD, and then also using the built in numpy covariance function. I then compare the covariance output matrices… and I hope to see that they are identical (for the most part).

I am puzzled why I see that the two methods give 2 very different covariance matrices.

Assume that X is a 3 by 1000 matrix representing 1000 realizations in 3 separate trials.

SVD covariance method:
enter image description here

python code:

import numpy as np

x = np.vstack([np.random.randn(1000),np.random.randn(1000),np.random.randn(1000)]).T

U, s, V = np.linalg.svd(x, full_matrices = 0)

#
#calculate covariance using SVD
#

C = np.dot(np.dot(V,np.diag(s**2)),V.T)

print C

#
#check cov matrix calculation
#

CC = np.cov(x.T)
print "-------"
print CC

OUTPUT:

[[  986.36682109     2.73620015    32.36077455]
 [    2.73620015  1020.64662161   -26.40580027]
 [   32.36077455   -26.40580027  1032.3221444 ]]
-------
[[ 0.98769747  0.00423559 -0.03172589]
 [ 0.00423559  1.01986134  0.02640778]
 [-0.03172589  0.02640778  1.0335292 ]]

EDIT

I updated the problem with many more realizations. I noticed that the variances (along the diagonal) are all off by some factor of 10. The covariance values seems to be off by the same factor. the np.cov function must be normalizing the output matrix??

Best Answer

Three fixes are necessary:

(1) Subtract off the variable means before performing the SVD:

x = x - x.mean(axis=0)

(2) In the call U,s,V = np.linalg.svd(x), the V that is returned is already what you call $W^T$. So to obtain $X^TX=W\Sigma^2W^T$ you need to perform the matrix multiplication in the other order:

C = np.dot(np.dot(V.T,np.diag(s**2)),V)

(3) Normalize the result by $n-1$:

C = C / (x.shape[0] - 1)

Here's the updated function:

import numpy as np

def mycov(x):
    x = x - x.mean(axis=0)
    U, s, V = np.linalg.svd(x, full_matrices = 0)
    C = np.dot(np.dot(V.T,np.diag(s**2)),V)
    return C / (x.shape[0] - 1)

Test run:

x = np.vstack([np.random.randn(1000), np.random.randn(1000), np.random.randn(1000)]).T
myout = cov(x)
pyout = np.cov(x.T)
print myout
print "---"
print pyout
print "---"
print np.allclose(myout, pyout)

Output shows agreement with np.cov(x.T):

[[ 1.04987565 -0.01848885  0.01795246]
 [-0.01848885  1.01096332 -0.00429439]
 [ 0.01795246 -0.00429439  1.01202886]]
---
[[ 1.04987565 -0.01848885  0.01795246]
 [-0.01848885  1.01096332 -0.00429439]
 [ 0.01795246 -0.00429439  1.01202886]]
---
True
Related Question