Solved – Difference between principal directions and principal component scores in the context of dimensionality reduction

dimensionality reductionpca

I have performed principal component analysis (PCA) of data matrix $X$ by doing singular value decomposition (SVD)
$$
X = U S V',
$$
where the columns of $V$ are the principal directions/axes and the columns of $US$ are the principal components (scores).

I have 12 attributes so I also get 12 principal directions/axes. I have found the variance explained and chose to consider only 6 of the 12 principal directions since these 6 explain enough of variance.

But I get a little confused:

Most websites about PCA say that I should choose some principal components, but isn't it more correct to choose principal directions/axes since my objective is to reduce dimensionality?
I have seen that my matrix $V$ consists of 12 column vectors, each with 12 elements. If I choose 6 of these columns vectors, each vector still has 12 elements – but how is this possible if I have reduced the dimensionality?
Besides, there are 12 column vectors of $US$, representing the principal components (scores), but each column vector has an awful lot of elements. What does it mean?

I am really confused. I have read a lot about this, but I still don't quite understand it.

Best Answer

Most of these things are covered in my answers in the following two threads:

Still, here I will try to answer your specific concerns.

Think about it like that. You have, let's say, $1000$ data points in $12$-dimensional space (i.e. your data matrix $X$ is of $1000\times12$ size). PCA finds directions in this space that capture maximal variance. So for example PC1 direction is a certain axis in this $12$-dimensional space, i.e. a vector of length $12$. PC2 direction is another axis, etc. These directions are given by columns of your matrix $V$. All your $1000$ data points can be projected onto each of these directions/axes, yielding coordinates of $1000$ data points along each PC direction; these projections are what is called PC scores, and what I prefer to simply call PCs. They are given by the columns of $US$.

So for each PC you have a $12$-dimensional vector specifying the PC direction or axis and a $1000$-dimensional vector specifying the PC projection on this axis.

"Reducing dimensionality" means that you take several PC projections as your new variables (e.g. if you take $6$ of them, then your new data matrix will be of $1000\times 6$ size) and essentially forget about the PC directions in the original $12$-dimensional space.

Most websites about PCA say that I should choose some principal components, but isn't it more correct to choose principal directions/axes since my objective is to reduce dimensionality?

This is equivalent. One column of $V$ corresponds to one column of $US$. You can say that you choose some columns of $V$ or you can say that you choose some columns of $US$. Doesn't matter. Also, by "principal components" some people mean columns of $V$ and some people mean columns of $US$. Again, most of the time it does not matter.

I have seen that my matrix V consists of 12 column vectors, each with 12 elements. If I choose 6 of these columns vectors, each vector still has 12 elements - but how is this possible if I have reduced the dimensionality?

You chose 6 axes in the 12-dimensional space. If you only consider these 6 axes and discard the other 6, then you reduced your dimensionality from 12 to 6. But each of the 6 chosen axes is originally a vector in the 12-dimensional space. No contradiction.

Besides, there are 12 column vectors of US, representing the principal components (scores), but each column vector has an awful lot of elements. What does it mean?

As I said, these are the projections on the principal axes. If your data matrix had 1000 points, then each PC score vector will have 1000 points. Makes sense.

Related Solutions

Solved – How to get the principal components of one matrix along the principal directions of another matrix

You get the coefficients from PCA. These coefficients are multiplied by your observation matrix to obtain the components. So, multiply rotation by the new observation matrix instead. Don't forget to center it.

Here's the code.

Run PCA and see how the score matrix is obtained from the original data and the rotation. Note, that I'm NOT centering, and you probably should.

> x=matrix(c(1,2,3,2,4,5.5),3,2)
> x
     [,1] [,2]
[1,]    1  2.0
[2,]    2  4.0
[3,]    3  5.5
> r=prcomp(x,retx=1,center=FALSE)
> r$rotation
                PC1        PC2
    [1,] -0.4666132  0.8844615
    [2,] -0.8844615 -0.4666132
    > r$x
           PC1         PC2
[1,] -2.235536 -0.04876479
[2,] -4.471072 -0.09752958
[3,] -6.264378  0.08701220
> x %*% r$rotation
           PC1         PC2
[1,] -2.235536 -0.04876479
[2,] -4.471072 -0.09752958
[3,] -6.264378  0.08701220

Now, apply the same rotation to the different data (again, see that I am NOT centering).

> y=matrix(c(1,2,3,2,4,6.5),3,2)
> y
     [,1] [,2]
[1,]    1  2.0
[2,]    2  4.0
[3,]    3  6.5
> y %*% r$rotation
           PC1         PC2
[1,] -2.235536 -0.04876479
[2,] -4.471072 -0.09752958
[3,] -7.148839 -0.37960095

Note the similarity of the new scores.

By the way, this is used a lot in forecasting with PCA. We obtain the rotation on historical data, then apply it to new data.

Solved – Principal Component Analysis and input feature distribution

PCA is a whitening transformation. After applying PCA, all the different principal components are uncorrelated, even if some of the original variables are highly correlated. This question has been discussed here.
PCA is sensitive to outliers. This question has been discussed here.

2b. You can always tune your threshold and the number of removed observations will change accordingly. However, your data may not be distributed like a bell-curve, and the distribution may not even be symmetric. You should plot some histogram of your data to get a sense of how your data is distributed, and what your outliers look like.

No. This really depends on your data and what you want to do with it after applying PCA. This has been discussed here.

Best Answer

Related Solutions

Solved – How to get the principal components of one matrix along the principal directions of another matrix

Solved – Principal Component Analysis and input feature distribution

Related Question