Solved – Principal component analysis in R

mathematical-statisticspcar

I am curerntly running PCA for returns series of 50 stocks for 524 observation. I have completed following steps like I have computed covariance matrix, got the loading for each of them and also respective scores for all 50 stocks. There are 20 component explaining 80% variation.

  1. When we get final scores for example pc1 = -0.20*v1-0.50*v2.....-0.60*v50 in this case we will multiply this weight to return series of this variable. If yes, then what would be final output?
  2. Is it fine to have all values negative in your first PC1?
  3. if someone can help me with R example to preceed further from Q1 it would really a great help.
  4. When my output says there are 20 component explaining 80% variation that means I need to derive pc1+pc2....+pc20 and the resultant value would be my final value is that correct ?

Thanks

Best Answer

In R you do not need to compute the covariance matrix etc. yourself. Better use the prcomp() function.

If data is your 524x50 matrix you need to do

# scale. = TRUE if you want to scale variables to same variance.
pca <- prcomp(data, scale. = TRUE)
plot(pca)

The latter plot is a screeplot, with the amount of explained variance per principal component. To get the loadings, you simply do

pca$rotation
pca$rotation[,1]

It is fine to have only positive or negative loadings on the first component. If all your variables are very correlated that can be the case.

Not knowing what you want to do with this PCA, you can do a biplot.

biplot(pca)

This shows the individuals projected on the first 2 principal components, and the variables with an identical projection so that you can compare them.

You can also do it manually on each principal component.

data %*% pca$rotation[,1]
# Get projecions on both axes.
data %*% pca$rotation[,1:2]

Regarding Q4, it means that you can simplify your 50-dimensional space in a 20-dimensional space while keeping 80% of your variance. The score you are proposing is a 1D space. If you want to keep the first 20 PC, you need to project on the first 20 PC as follows:

projects <- data %*% pca$rotation[,1:20]

But 20-dimensional can barely be called a simplification. Most of the time the aim is to have 2-3 because you can still represent them in our space. Keeping 20 might make sense though. Depends on how you want to follow up.