Solved – Raw eigenvectors or standardized eigenvectors for principal component regression

Thanks to @amoeba, I learned that standardized eigenvectors are sometimes calculated, i.e., eigenvectors are divided by the square root of their eigenvalues.

Now, when I want to do principal component regression (PCR), I first calculate the components that I subsequently use in regression. There are two procedures that can be used:

I calculate my components once as raw components, that is I do not standardize the eigenvectors.
I calculate my components as components based on standardized eigenvectors.

Now, I do PCR once with the components in situation 1, and once with the components in situation 2.

My questions:

What are the implications (a) for the coefficients, (b) for the interpretation in the two cases?
What are the main differences?
I know that the variance of the components will differ, but how does that change the behaviour in regression, or does it at all?
Is there any recommendation to whether I should use the one but not the other?

Best Answer

I'm reviewing this and found a nice paper Parameter Estimation in Factor Analysis: Maximum Likelihood versus Principal Component which makes me think that the eigenvectors are indeed standardized, if only for convenient interpretation. They're operating on a standardized nxp data matrix, $\mathbf{Z}$, where the variance matrix has ones on the diagonal and thus there are p units of variance (the trace) "to distribute".

For eigenpairs $(\lambda_k, \mathbf{e_k}), k = 1, .., p$, the sum of the eigenvectors is still $p$ and the proportion of the variance is $\lambda_k / p$. To have the nice interpretation of distributing these p units of variance, you'd clearly have to be working with unit eigenvectors (though the paper doesn't specifically say it).

Best Answer

Related Solutions

PCA – Is Standardizing Principal Components After PCA Necessary?

Related Question