Solved – Low-rank SVD reconstruction and linear projection of the data

pcasvd

Short version:
I wonder if the singular value decomposition (SVD) low-rank approximation $X = U_q\Sigma_q V^T_q$ is a projection onto $V_q$ or $V_q^T$?

My understanding of the answer: The basic idea is that $BA^T = (A^TB)^T$ when we set the data of $X$ in rows and we want our projections to result in rows again.

The projection is given by $\langle v,x\rangle v$ since $v$ is a unit vector:

enter image description here

so everything makes sense. (I was confused – I thought is was given by $Vx$)


I read here, that the solution to $$\min_{\mu,V_q,\lambda}\Big(\sum_i \|x_i- \mu – V_q\lambda\|_2^2 \Big),$$ meaning the reconstruction (approximation) of data points by an affine hyper-plane of rank $q$ ($V_q$ is orthogonal matrix), is given by $\mu=\bar{x}, \lambda_i = V_q^T(x_i-\bar{x}) $
and $V_q$ is the first $q$ columns of $V$ in the SVD decomposition $$X = U\Sigma V^T.$$

But as far as I understand (also in here) the principle components are the columns of $V^T$(!) so if we want to find the projection of $X$ on the rank-$q$ approximation we will left-multiply by $V$ (which outputs a vector in $Span(V^T)$) and indeed $$XV = U\Sigma V^TV = U\Sigma$$

which are the PC scores (on the rows); for new data we will just left-multiply by $V$ to get the projection.

Also, in the same book the authors showed that the rank-2 reconstruction is actually $ U\Sigma$ :

enter image description here

So, is the hyperplane on the left picture above is the span of $V_2$ or $V_2^T$? and if it's the latter, how does it settle with fact that the approx. hyper-plane is $V_2$ according to minimization problem?

Best Answer

The confusion is probably due to how the data matrix is organized.

In The Elements of Statistical Learning (and also in the Abdi & Williams paper you linked to), data matrix $\mathbf X$ has data points in rows and variables in columns; so one data vector is a row. Whereas a single data vector $\mathbf x$ is usually understood to be a column vector. So one has to be carefully watching the algebra: if you want to project the data onto an axis $\mathbf v$, you need to write $\mathbf X \mathbf v$, but $\mathbf v^\top \mathbf x$.

Now, if $\mathbf X$ is centered and you do singular value decomposition (SVD) $$\mathbf X = \mathbf {USV}^\top,$$ then COLUMNS of $\mathbf V$ are principal axes (also called principal directions). The first column is the first axis, the second column is the second axis, etc. To project the data onto the first two principal axes, we write $\mathbf X \mathbf V_2$, where $\mathbf V_2$ are the first two columns of $\mathbf V$. The hyperplane spanned by the first two principal axes is spanned by the first two columns of $\mathbf V$. There is no contradiction.