Solved – How to make a scree plot out of SVD data

covariancepcapythonsvdvariance

After doing a singular value decomposition (SVD) of a data set, I'm left with three matrices:
1. An orthogonal Left Singular Vector (U)
2. diagonal matrix with elements in descending order (S)
3. An orthogonal Right Singular Vector (V)

In order to plot PC1 vs PC2, I made a scatter plot (V1:V2). V1 and V2 are first and second column of V

In order to plot a scree plot: I squared the S and plotted (i,i) element of S^2 with respect to 'i'. I was thinking that diagonal elements of S^2 will give me variance.

Am I doing this right?

EDIT 1

The background:

I'm working on analysis of multiple trajectories from Molecular Dynamics simulations and I wanted to make sure that all my trajectories are somewhat exploring similar configuration space as a control measure. I use a custom made SVD code that can take trajectory info as input. Unfortunately, this data type is not compatible with R like programs without further effort. But the code that I'm using already spits out LSV, RSV and diagonal matrices after SVD.

I can rephrase the question as below :
Can I take the square of diagonal elements of (S) matrix as variance so that I can use it as the y axis of the scree plot. Given that both my LSV and RSV are orthogonal.

Best Answer

This Answer is inspired by the following thread:

Relationship between SVD and PCA. How to use SVD to perform PCA? (@ameoba's excellent explanation for PCA and SVD. If you are totally unfamiliar with PCA and SVD. Give this a read first)

In order to plot PC1 vs PC2, I made a scatter plot (V1:V2). V1 and V2 are first and second column of V

This is wrong!

$\mathbf{V}$ is the matrix of eigenvectors - each column is an eigenvector.

You want Principal components (PCs), not eigenvectors while plotting PC1 Vs PC2. PCs are the projections of your data matrix onto the eigenvector matrix. Thus you have to do the transformation $U\Sigma$ or $\mathbf{X}V$, where $U$, $\Sigma$, $V^{\top}$ are left singular, diagonal, right singular matrices respectively.

Thus one should be plotting first two elements of i^th row of $\mathbf{X}V$ corresponding to PC1 and PC2 coordinates of i^th data point in PC space.

#Python implementation 1
from sklearn.decomposition import PCA
#Make sure that you center your data
pca = PCA()
pca.fit(YourData) # calculate loading score and variation of each PC. Basically fit the model with input data
pca_data = pca.transform(YourData) # Apply dimensionality reduction. Coordinates for PCA graph is generated.


#Python implementation 2
from scipy.linalg import svd
U,s,VT = svd(YourData)
T = YourData.dot(VT.T)


# You can see that both implementations give the same result
print(pca_data)
print(T)

I can rephrase the question as below: Can I take the square of diagonal elements of (S) matrix as variance so that I can use it as the y-axis of the scree plot. Given that both my LSV and RSV are orthogonal.

Here's my current understanding of making scree plot using SVD.

The y-axis of scree plot is basically the explained variance of i^th PC and the x-axis is increasing order of i.

From the above python example, both of the following can form y-axis of the scree plot

print(np.round(pca.explained_variance_ratio_ *100, decimals=1))
print(np.round((np.square(s)/(np.sum(np.square(s))))*100, decimals=1))

Related Solutions

SVD – Singular Value Decomposition of a Three-Dimensional Array Explained

There are several notions of decomposition of such a tensor. Last year I asked essentially the same question on the MaplePrimes site, answered it myself by referring to wikipedia, and provided an implementation for one of those notions (the CANDECOMP/PARAFAC decomposition) in a follow-up post (applied to decomposing the $3\times m \times n$ tensor given by the R,G,B entries of an image).

Principal Component Analysis – Why Use PCA of Data by Means of SVD

Here are my 2ct on the topic

The chemometrics lecture where I first learned PCA used solution (2), but it was not numerically oriented, and my numerics lecture was only an introduction and didn't discuss SVD as far as I recall.
If I understand Holmes: Fast SVD for Large-Scale Matrices correctly, your idea has been used to get a computationally fast SVD of long matrices.
That would mean that a good SVD implementation may internally follow (2) if it encounters suitable matrices (I don't know whether there are still better possibilities). This would mean that for a high-level implementation it is better to use the SVD (1) and leave it to the BLAS to take care of which algorithm to use internally.

Quick practical check: OpenBLAS's svd doesn't seem to make this distinction, on a matrix of 5e4 x 100, svd (X, nu = 0) takes on median 3.5 s, while svd (crossprod (X), nu = 0) takes 54 ms (called from R with microbenchmark).
The squaring of the eigenvalues of course is fast, and up to that the results of both calls are equvalent.

timing  <- microbenchmark (svd (X, nu = 0), svd (crossprod (X), nu = 0), times = 10)
timing
# Unit: milliseconds
#                      expr        min         lq    median         uq        max neval
#            svd(X, nu = 0) 3383.77710 3422.68455 3507.2597 3542.91083 3724.24130    10
# svd(crossprod(X), nu = 0)   48.49297   50.16464   53.6881   56.28776   59.21218    10

update: Have a look at Wu, W.; Massart, D. & de Jong, S.: The kernel PCA algorithms for wide data. Part I: Theory and algorithms , Chemometrics and Intelligent Laboratory Systems , 36, 165 - 172 (1997). DOI: http://dx.doi.org/10.1016/S0169-7439(97)00010-5

This paper discusses numerical and computational properties of 4 different algorithms for PCA: SVD, eigen decomposition (EVD), NIPALS and POWER.

They are related as follows:

computes on      extract all PCs at once       sequential extraction    
X                SVD                           NIPALS    
X'X              EVD                           POWER

The context of the paper are wide $\mathbf X^{(30 \times 500)}$, and they work on $\mathbf{XX'}$ (kernel PCA) - this is just the opposite situation as the one you ask about. So to answer your question about long matrix behaviour, you need to exchange the meaning of "kernel" and "classical".

performance comparison

Not surprisingly, EVD and SVD change places depending on whether the classical or kernel algorithms are used. In the context of this question this means that one or the other may be better depending on the shape of the matrix.

But from their discussion of "classical" SVD and EVD it is clear that the decomposition of $\mathbf{X'X}$ is a very usual way to calculate the PCA. However, they do not specify which SVD algorithm is used other than that they use Matlab's svd () function.

    > sessionInfo ()
    R version 3.0.2 (2013-09-25)
    Platform: x86_64-pc-linux-gnu (64-bit)

    locale:
     [1] LC_CTYPE=de_DE.UTF-8       LC_NUMERIC=C               LC_TIME=de_DE.UTF-8        LC_COLLATE=de_DE.UTF-8     LC_MONETARY=de_DE.UTF-8   
     [6] LC_MESSAGES=de_DE.UTF-8    LC_PAPER=de_DE.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
    [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

    attached base packages:
    [1] stats     graphics  grDevices utils     datasets  methods   base     

    other attached packages:
    [1] microbenchmark_1.3-0

loaded via a namespace (and not attached):
[1] tools_3.0.2

$ dpkg --list libopenblas*
[...]
ii  libopenblas-base              0.1alpha2.2-3                 Optimized BLAS (linear algebra) library based on GotoBLAS2
ii  libopenblas-dev               0.1alpha2.2-3                 Optimized BLAS (linear algebra) library based on GotoBLAS2

Best Answer

Related Solutions

SVD – Singular Value Decomposition of a Three-Dimensional Array Explained

Principal Component Analysis – Why Use PCA of Data by Means of SVD

Related Question