I have performed principal component analysis (PCA) of data matrix $X$ by doing singular value decomposition (SVD)
$$
X = U S V',
$$
where the columns of $V$ are the principal directions/axes and the columns of $US$ are the principal components (scores).
I have 12 attributes so I also get 12 principal directions/axes. I have found the variance explained and chose to consider only 6 of the 12 principal directions since these 6 explain enough of variance.
But I get a little confused:
-
Most websites about PCA say that I should choose some principal components, but isn't it more correct to choose principal directions/axes since my objective is to reduce dimensionality?
-
I have seen that my matrix $V$ consists of 12 column vectors, each with 12 elements. If I choose 6 of these columns vectors, each vector still has 12 elements – but how is this possible if I have reduced the dimensionality?
-
Besides, there are 12 column vectors of $US$, representing the principal components (scores), but each column vector has an awful lot of elements. What does it mean?
I am really confused. I have read a lot about this, but I still don't quite understand it.
Best Answer
Most of these things are covered in my answers in the following two threads:
Still, here I will try to answer your specific concerns.
Think about it like that. You have, let's say, $1000$ data points in $12$-dimensional space (i.e. your data matrix $X$ is of $1000\times12$ size). PCA finds directions in this space that capture maximal variance. So for example PC1 direction is a certain axis in this $12$-dimensional space, i.e. a vector of length $12$. PC2 direction is another axis, etc. These directions are given by columns of your matrix $V$. All your $1000$ data points can be projected onto each of these directions/axes, yielding coordinates of $1000$ data points along each PC direction; these projections are what is called PC scores, and what I prefer to simply call PCs. They are given by the columns of $US$.
So for each PC you have a $12$-dimensional vector specifying the PC direction or axis and a $1000$-dimensional vector specifying the PC projection on this axis.
"Reducing dimensionality" means that you take several PC projections as your new variables (e.g. if you take $6$ of them, then your new data matrix will be of $1000\times 6$ size) and essentially forget about the PC directions in the original $12$-dimensional space.
This is equivalent. One column of $V$ corresponds to one column of $US$. You can say that you choose some columns of $V$ or you can say that you choose some columns of $US$. Doesn't matter. Also, by "principal components" some people mean columns of $V$ and some people mean columns of $US$. Again, most of the time it does not matter.
You chose 6 axes in the 12-dimensional space. If you only consider these 6 axes and discard the other 6, then you reduced your dimensionality from 12 to 6. But each of the 6 chosen axes is originally a vector in the 12-dimensional space. No contradiction.
As I said, these are the projections on the principal axes. If your data matrix had 1000 points, then each PC score vector will have 1000 points. Makes sense.