PCA Examples – Low Variance Principal Components That Are Useful

pca

Normally in principal component analysis (PCA) the first few PCs are used and the low variance PCs are dropped, as they do not explain much of the variation in the data.

However, are there examples where the low variation PCs are useful (i.e. have use in the context of the data, have an intuitive explanation, etc.) and should not be thrown away?

Best Answer

Here's a cool excerpt from Jolliffe (1982) that I didn't include in my previous answer to the very similar question, "Low variance components in PCA, are they really just noise? Is there any way to test for it?" I find it pretty intuitive.

$\quad$Suppose that it is required to predict the height of the cloud-base, $H$, an important problem at airports. Various climatic variables are measured including surface temperature $T_s$, and surface dewpoint, $T_d$. Here, $T_d$ is the temperature at which the surface air would be saturated with water vapour, and the difference $T_s-T_d$, is a measure of surface humidity. Now $T_s,T_d$ are generally positively correlated, so a principal component analysis of the climatic variables will have a high-variance component which is highly correlated with $T_s+T_d$,and a low-variance component which is similarly correlated with $T_s-T_d$. But $H$ is related to humidity and hence to $T_s-T_d$, i.e. to a low-variance rather than a high-variance component, so a strategy which rejects low-variance components will give poor predictions for $H$.
$\quad$The discussion of this example is necessarily vague because of the unknown effects of any other climatic variables which are also measured and included in the analysis. However, it shows a physically plausible case where a dependent variable will be related to a low-variance component, confirming the three empirical examples from the literature.
$\quad$Furthermore, the cloud-base example has been tested on data from Cardiff (Wales) Airport for the period 1966–73 with one extra climatic variable, sea-surface temperature, also included. Results were essentially as predicted above. The last principal component was approximately $T_s-T_d$, and it accounted for only 0·4 per cent of the total variation. However, in a principal component regression it was easily the most important predictor for $H$. [Emphasis added]

The three examples from literature referred to in the last sentence of the second paragraph were the three I mentioned in my answer to the linked question.


Reference
Jolliffe, I. T. (1982). Note on the use of principal components in regression. Applied Statistics, 31(3), 300–303. Retrieved from http://automatica.dei.unipd.it/public/Schenato/PSC/2010_2011/gruppo4-Building_termo_identification/IdentificazioneTermodinamica20072008/Biblio/Articoli/PCR%20vecchio%2082.pdf.