Solved – PCA on original variables vs. PCA on residuals

pcastandardization

I have 10 measures of snakes that are strongly correlated (body length, tail length and 8 measures of head size). My dataset consists of different snake sizes (but excluded non-adults) so smaller (younger) snakes have smaller heads and shorter tails (because they depend on body length). Usually there are two approaches how to implement PCA on measures in biology in this case. One of them is to regress all measures on body length and then preform PCA on residuals + body length. The second approach is to perform PCA on original variables. I don´t know why should I regress all measures on body length, because in PCA it is not problem, when my varables are correlated. Only sense I can see in this procedure is in reducing potential multicollinearity.

If I always perform PCA using correlation matrix and oblique rotation in SPSS, does it make sense to do it on residuals? What is the difference between results of these two approaches? Aim of these PCAs is to reduce multidimensionality and then plot the graph of new components…

Can one think, that performing PCA on residuals is something like "size-standardizing" and performing PCA on original variables is not "size-standardized PCA"? Does it make sense?

Do I have to standardize all measures on the same size before performing PCA? I don´t think so, but I want to be sure.

Thanks

EDIT: some biological sources to this topic
PCA on residuals:
http://www.pnas.org/content/94/8/3828.full
http://www.ispa.pt/ui/uie/pdf/OliveiraAlmada1995.pdf

PCA on original variables: http://swfsc.noaa.gov/uploadedFiles/Divisions/PRD/Publications/Jefferson_VW2004(82).pdf
http://www.zool.uzh.ch/static/research/oekologie/literatur/pdf05_01/2001Evolution.pdf

These two approaches are most common in biology before PCA, few authors use for size-standardization ratio of dependent variable and body length. I don´t know if this question is unclear, or you don´t know answer so please, let me know how could I improve it.

Best Answer

I think if you do PCA on all the variables, the first PC is very likely to be a "general size measure" so the idea of doing PCA on residuals may be to get rid of that. But, as far as I can see, the 2nd, 3rd etc. PCs on the original variables will be very similar to the 1st, 2nd, etc PCs on the residuals.

Related Question