Does it make sense to standardize Principal Components after performing Principal Component Analysis

pcarstandardization

I am attempting to emulate the following paper by Messer et al., using year 2000 decennial Census data in R to create an index known as the Neighborhood Deprivation Index(NDI): https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3261293/#CR73

The Component Extraction and Index Construction section of the paper outlines the steps it took to construct the index using Principal Components Analysis.

I am aware that the variables used in PCA are often standardized before performing the method, as PCA is sensitive to the variances of the initial variables. If there are large differences between the ranges of the initial variables, those variables with larger ranges will dominate over those with small ranges, leading to biased results. However, the final step in the process outlined in the paper states:

The deprivation index was then standardized to have a mean of 0 and standard deviation (SD) of 1 by dividing the index by the square of the eigenvalue.

I read this statement as the resulting principal components being standardized and not that the data used in the creation of the principal components were standardized.

There are a few problems I've faced in trying to emulate their step:

The paper is a bit vague
There is no attached code where this paper is available
I couldn't find anything on post PC construction standardization

So I have a few questions.

Am I right in understanding the last step as standardizing the principal components?
Does it make sense to standardize the principal components if you standardized the data before performing PCA?
From my understanding, there should be an eigenvalue produced for each variable used in the construction of the PCA. If the last step is stating I should divide the index (the PCs?) by the eigenvalue^2, how am I supposed to do that if there are 8 eigenvalues for this specific scenario?

Best Answer

I mostly agree with your read of the paper.

The eigenvalues correspond to variance, so we would want to divide by the square roots of the eigenvalues to get standard deviation, but I think this is a typo.
I'll defer to the paper to say how helpful it is, but it makes sense to standardize features, no matter how you extract them. Since you standardized the original features to have means of zero, you wind up with principal components with means of zero, so they do not need the means subtracted to standardize to have means of zero and variances of one.
Each principal component has an eigenvalue and eigenvector corresponding to it. For each principal component, you would use the corresponding eigenvalue.

Best Answer

Related Solutions

Solved – How to handle data imbalance in Principal Component Analysis

Solved – Does it make sense to run LDA on several principal components and not on all variables

Related Question