Solved – Why is variance (instead of standard deviation) the default measure of information content in principal components

pcastandard deviationvariance

The information content of principal components is almost always expressed as a variance (e.g., in scree plots or in statements like "the first three PCs contain 95% of the total data variance"). The intent of this usage is to describe how much variation/information is contained in the PCs. it seems to me that variance can be a misleading measure of information contained in PCs, because it is a squared metric of variation that emphasizes large deviations from the mean over small ones. This can grossly underemphasize the importance of information contained in lower-eigenvalue PCs. The standard deviation of PCs would seem to be a much more direct, meaningful and balanced metric of the information they contain.

I am very clear on the rationale for the use of variance in statistics more generally, i.e. it is much more mathematically convenient than standard deviation. However, I'm wondering if there is a specific rationale for why variance is used a measure of variation in PCs instead of standard deviation. Are there any good references for this dilemma?

Update to clarify: I should be clear that I am not asking about why variance is used in the derivation of the principal components, but rather why it is used as a default descriptor of variation in the PCs when reporting results of the PCA. Many people seem to use "variance" and "variation" as synonymous in this context, but isn't standard deviation a measure of variation, and variance a squared measure of variation? A PC that contains 95% of the data variance might contain only 80% of the variation in the data as measured in standard deviations: isn't the latter a better descriptor?

Best Answer

Reporting standard deviations instead of variances

I think you are right in that standard deviation of each PC can perhaps be a more reasonable or a more intuitive (for some) measure of its "influence" than its variance. And actually it even has a clear mathematical interpretation: variances of PCs are eigenvalues of the covariance matrix, but standard deviations are singular values of the centered data matrix [only scaled by $1/\sqrt{n-1}$].

So yes, it is completely fine to report it. Moreover, e.g. R does report standard deviations of PCs rather than their variances. For example running this simple code:

irispca <- princomp(iris[-5])
summary(irispca)

results in this:

Importance of components:
                          Comp.1     Comp.2     Comp.3      Comp.4
Standard deviation     2.0494032 0.49097143 0.27872586 0.153870700
Proportion of Variance 0.9246187 0.05306648 0.01710261 0.005212184
Cumulative Proportion  0.9246187 0.97768521 0.99478782 1.000000000

There are standard deviations here, but not variances.

Explained variance

A PC that contains 95% of the data variance might contain only 80% of the variation in the data as measured in standard deviations: isn't the latter a better descriptor?

However, note that after presenting standard deviations, R does not display a "proportion of standard deviation", but instead a proportion of variance. And there is a very good reason for that.

Mathematically, total variance (being a trace of covariance matrix) is preserved under rotations. This means that the sum of variance of original variables is equal to the sum of variances of PCs. In case of the same Fisher Iris dataset, this sum is equal to $4.57$, and so we can say that PC1, having a variance of $2.05^2=4.20$ explains $92\%$ of the total variance.

But the sum of standard deviations is not preserved! The sum of standard deviations of original variables is $3.79$. The sum of standard deviations of PCs is $2.98$. They are not equal! So if you want to say that PC1 with standard deviation $2.05$ explains $x\%$ of the "total standard deviation", what would you take as this total? There is no answer, because it simply does not make sense.

The bottom line is that it is completely fine to look at the standard deviation of each PC and even compare them between each other, but if you want to talk about "explained" something, then only "explained variance" makes sense.

Related Question