PCA Variance Explained – Do Components of PCA Represent Percentage of Variance and Can They Sum to More Than 100%?

pcar

O'Reilly's "Machine Learning For Hackers" says that each principal component represents a percentage of the variance. I've quoted the relevant part of the page below (chapter 8, p.207). Speaking to another expert, they agreed it is the percentage.

However the 24 components sum to 133.2095%. How can that be?

Having convinced ourselves that we can use PCA, how do we do that in R? Again, this
is a place where R shines: the entirety of PCA can be done in one line of code. We use
the princomp function to run PCA:

pca <- princomp(date.stock.matrix[,2:ncol(date.stock.matrix)])

If we just type pca into R, we’ll see a quick summary of the principal components:

Call:
princomp(x = date.stock.matrix[, 2:ncol(date.stock.matrix)])
Standard deviations:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7
29.1001249 20.4403404 12.6726924 11.4636450 8.4963820 8.1969345 5.5438308
Comp.8 Comp.9 Comp.10 Comp.11 Comp.12 Comp.13 Comp.14
5.1300931 4.7786752 4.2575099 3.3050931 2.6197715 2.4986181 2.1746125
Comp.15 Comp.16 Comp.17 Comp.18 Comp.19 Comp.20 Comp.21
1.9469475 1.8706240 1.6984043 1.6344116 1.2327471 1.1280913 0.9877634
Comp.22 Comp.23 Comp.24
0.8583681 0.7390626 0.4347983
24 variables and 2366 observations.

In this summary, the standard deviations tell us how much of the variance in the data
set is accounted for by the different principal components. The first component, called
Comp.1, accounts for 29% of the variance, while the next component accounts for 20%.
By the end, the last component, Comp.24, accounts for less than 1% of the variance.
This suggests that we can learn a lot about our data by just looking at the first principal
component.

[Code and data can be found on github.]

Best Answer

Use summary.princomp to see the "Proportion of Variance" and "Cumulative Proportion".

pca <- princomp(date.stock.matrix[,2:ncol(date.stock.matrix)])
summary(pca)