[Math] PCA using SVD in Matlab, a few questions.

linear algebraMATLABstatisticssvd

I have X = [25, 2000] i.e. 25 subjects and 2000 values (i.e. each subject has a spectrogram that is reduced to 2000 values).

My goal is to reduce from 25 subjects to 1 or 2 "subjects" that best explains the data across the group.

If I do [u,s,v]=svd(X) (in matlab)
or [u1,s1,v1]=svd(X')

What would be 1st and 2nd principle components?

Is it just columns of v (in first case) or columns of u (in transposed case)

OR do I have to do T = vX or T=uX and then the 1st and 2nd row of this?

Best Answer

If the SVD of $X$ is $X=USV^\top$, then the SVD of $X^\top$ is just the transpose of the prior factorization, $X^\top=VSU^\top$ or $U_1=V$, $S_1=S$ and $V_1=U$.

The principal components of this approach are the singular vectors with the largest singular values. In the implementations, the diagonal matrix $S$ contains the singular values sorted from largest to smallest, so that you only have to consider the first two components. If $X$ has format $25\times 2000$, then the columns of the $25\times 25$ matrix $U$ contain the singular vectors you are interested in.


Update

PCA was originally invented in mechanics to study the kinematics of rigid bodies, for instance the rotation and nutation and oscillations of planets. The idea there is that these kinematics are the same as an ellipsoid that is aligned and shaped according to the principal components of the mass distribution. Any movement of a rigid body can be described as the movement of its center of mass and a rotation around that center of mass.


If the data is not shifted so that the center of mass is the origin, for instance if in 2D all points are clustered around $(1,1)$, then the principal component of the data set will be close to this point $(1,1)$. But to get that point, one could just as well only have computed the center of mass or mean value of all data points. To get the information about the shape of the cluster out of the SVD, you have to subtract the center of mass.

If that is what you mean by 'subtracting the baseline' then all is well in that regard. But still, the application of SVD makes the most sense if you can say that if you flip the sign of an input vector, then this could have reasonable come as well from a measurement in the experiment.


The result of the SVD can be written as $$ X=\sum_{k=1}^r u_k\sigma_k v_k^\top. $$ If one pair of $(u_k,v_k)$ is replaced by $(-u_k,-v_k)$ then noting changes in the sum, the sign change cancels between both factors.

To get the data set of person $j$ out of the matrix $X$ one has to select row $j$ of $X$ as $e_j^\top X$. Now if $X$ gets compressed by using only the terms for the first or first two singular values in the SVD, the approximation of data $j$ set will be $$ e_j^\top X=\sum_{k=1}^2 (e_j^\top u_k)(\sigma_kv_k)^\top =\sum_{k=1}^2 U_{jk}(\sigma_kv_k)^\top. $$ Again, any sign changes in $v_k$ in the computation of the SVD are balanced by sign changes in the coefficients $e_j^\top u_k=U_{jk}$.

One heuristic to make the sign definitive could be to make sure that the entry with largest absolute value in every vector $u_k$ is positive.

Related Question