Solved – Difference between principal directions and principal component scores in the context of dimensionality reduction

dimensionality reductionpca

I have performed principal component analysis (PCA) of data matrix $X$ by doing singular value decomposition (SVD)
$$
X = U S V',
$$
where the columns of $V$ are the principal directions/axes and the columns of $US$ are the principal components (scores).

I have 12 attributes so I also get 12 principal directions/axes. I have found the variance explained and chose to consider only 6 of the 12 principal directions since these 6 explain enough of variance.

But I get a little confused:

  1. Most websites about PCA say that I should choose some principal components, but isn't it more correct to choose principal directions/axes since my objective is to reduce dimensionality?

  2. I have seen that my matrix $V$ consists of 12 column vectors, each with 12 elements. If I choose 6 of these columns vectors, each vector still has 12 elements – but how is this possible if I have reduced the dimensionality?

  3. Besides, there are 12 column vectors of $US$, representing the principal components (scores), but each column vector has an awful lot of elements. What does it mean?

I am really confused. I have read a lot about this, but I still don't quite understand it.

Best Answer

Most of these things are covered in my answers in the following two threads:

  1. Relationship between SVD and PCA. How to use SVD to perform PCA?
  2. What exactly is called "principal component" in PCA?

Still, here I will try to answer your specific concerns.


Think about it like that. You have, let's say, $1000$ data points in $12$-dimensional space (i.e. your data matrix $X$ is of $1000\times12$ size). PCA finds directions in this space that capture maximal variance. So for example PC1 direction is a certain axis in this $12$-dimensional space, i.e. a vector of length $12$. PC2 direction is another axis, etc. These directions are given by columns of your matrix $V$. All your $1000$ data points can be projected onto each of these directions/axes, yielding coordinates of $1000$ data points along each PC direction; these projections are what is called PC scores, and what I prefer to simply call PCs. They are given by the columns of $US$.

So for each PC you have a $12$-dimensional vector specifying the PC direction or axis and a $1000$-dimensional vector specifying the PC projection on this axis.

"Reducing dimensionality" means that you take several PC projections as your new variables (e.g. if you take $6$ of them, then your new data matrix will be of $1000\times 6$ size) and essentially forget about the PC directions in the original $12$-dimensional space.


Most websites about PCA say that I should choose some principal components, but isn't it more correct to choose principal directions/axes since my objective is to reduce dimensionality?

This is equivalent. One column of $V$ corresponds to one column of $US$. You can say that you choose some columns of $V$ or you can say that you choose some columns of $US$. Doesn't matter. Also, by "principal components" some people mean columns of $V$ and some people mean columns of $US$. Again, most of the time it does not matter.

I have seen that my matrix V consists of 12 column vectors, each with 12 elements. If I choose 6 of these columns vectors, each vector still has 12 elements - but how is this possible if I have reduced the dimensionality?

You chose 6 axes in the 12-dimensional space. If you only consider these 6 axes and discard the other 6, then you reduced your dimensionality from 12 to 6. But each of the 6 chosen axes is originally a vector in the 12-dimensional space. No contradiction.

Besides, there are 12 column vectors of US, representing the principal components (scores), but each column vector has an awful lot of elements. What does it mean?

As I said, these are the projections on the principal axes. If your data matrix had 1000 points, then each PC score vector will have 1000 points. Makes sense.