Solved – Raw eigenvectors or standardized eigenvectors for principal component regression

pcaregression

Thanks to @amoeba, I learned that standardized eigenvectors are sometimes calculated, i.e., eigenvectors are divided by the square root of their eigenvalues.

Now, when I want to do principal component regression (PCR), I first calculate the components that I subsequently use in regression. There are two procedures that can be used:

  1. I calculate my components once as raw components, that is I do not standardize the eigenvectors.

  2. I calculate my components as components based on standardized eigenvectors.

Now, I do PCR once with the components in situation 1, and once with the components in situation 2.

My questions:

  • What are the implications (a) for the coefficients, (b) for the interpretation in the two cases?
  • What are the main differences?
  • I know that the variance of the components will differ, but how does that change the behaviour in regression, or does it at all?
  • Is there any recommendation to whether I should use the one but not the other?

Best Answer

I'm reviewing this and found a nice paper Parameter Estimation in Factor Analysis: Maximum Likelihood versus Principal Component which makes me think that the eigenvectors are indeed standardized, if only for convenient interpretation. They're operating on a standardized nxp data matrix, $\mathbf{Z}$, where the variance matrix has ones on the diagonal and thus there are p units of variance (the trace) "to distribute".

For eigenpairs $(\lambda_k, \mathbf{e_k}), k = 1, .., p$, the sum of the eigenvectors is still $p$ and the proportion of the variance is $\lambda_k / p$. To have the nice interpretation of distributing these p units of variance, you'd clearly have to be working with unit eigenvectors (though the paper doesn't specifically say it).

Related Question