Solved – How to make a scree plot out of SVD data

covariancepcapythonsvdvariance

After doing a singular value decomposition (SVD) of a data set, I'm left with three matrices:
1. An orthogonal Left Singular Vector (U)
2. diagonal matrix with elements in descending order (S)
3. An orthogonal Right Singular Vector (V)

In order to plot PC1 vs PC2, I made a scatter plot (V1:V2). V1 and V2 are first and second column of V

In order to plot a scree plot: I squared the S and plotted (i,i) element of S^2 with respect to 'i'. I was thinking that diagonal elements of S^2 will give me variance.

Am I doing this right?

EDIT 1

The background:

I'm working on analysis of multiple trajectories from Molecular Dynamics simulations and I wanted to make sure that all my trajectories are somewhat exploring similar configuration space as a control measure. I use a custom made SVD code that can take trajectory info as input. Unfortunately, this data type is not compatible with R like programs without further effort. But the code that I'm using already spits out LSV, RSV and diagonal matrices after SVD.

I can rephrase the question as below :
Can I take the square of diagonal elements of (S) matrix as variance so that I can use it as the y axis of the scree plot. Given that both my LSV and RSV are orthogonal.

Best Answer

This Answer is inspired by the following thread:

  1. Relationship between SVD and PCA. How to use SVD to perform PCA? (@ameoba's excellent explanation for PCA and SVD. If you are totally unfamiliar with PCA and SVD. Give this a read first)

In order to plot PC1 vs PC2, I made a scatter plot (V1:V2). V1 and V2 are first and second column of V

This is wrong!

$\mathbf{V}$ is the matrix of eigenvectors - each column is an eigenvector.

You want Principal components (PCs), not eigenvectors while plotting PC1 Vs PC2. PCs are the projections of your data matrix onto the eigenvector matrix. Thus you have to do the transformation $U\Sigma$ or $\mathbf{X}V$, where $U$, $\Sigma$, $V^{\top}$ are left singular, diagonal, right singular matrices respectively.

Thus one should be plotting first two elements of ith row of $\mathbf{X}V$ corresponding to PC1 and PC2 coordinates of ith data point in PC space.

#Python implementation 1
from sklearn.decomposition import PCA
#Make sure that you center your data
pca = PCA()
pca.fit(YourData) # calculate loading score and variation of each PC. Basically fit the model with input data
pca_data = pca.transform(YourData) # Apply dimensionality reduction. Coordinates for PCA graph is generated.


#Python implementation 2
from scipy.linalg import svd
U,s,VT = svd(YourData)
T = YourData.dot(VT.T)


# You can see that both implementations give the same result
print(pca_data)
print(T)

I can rephrase the question as below: Can I take the square of diagonal elements of (S) matrix as variance so that I can use it as the y-axis of the scree plot. Given that both my LSV and RSV are orthogonal.

Here's my current understanding of making scree plot using SVD.

The y-axis of scree plot is basically the explained variance of ith PC and the x-axis is increasing order of i.

From the above python example, both of the following can form y-axis of the scree plot

print(np.round(pca.explained_variance_ratio_ *100, decimals=1))
print(np.round((np.square(s)/(np.sum(np.square(s))))*100, decimals=1))