Solved – Why are PCA eigenvectors orthogonal but correlated

correlationorthogonalpcar

I've seen some great posts explaining PCA and why under this approach the eigenvectors of a (symmetric) correlation matrix are orthogonal. I also understand the ways to show that such vectors are orthogonal to each other (e.g. taking the cross-products of the matrix of these eigenvectors will result in a matrix with off-diagonal entries that are zero).

My first question is, when you look at the correlations of a PCA's eigenvectors, why are the off-diagonal entries of the correlation matrix non-zero (i.e. how can the eigenvectors be correlated if they are orthogonal)?

This question is not directly about PCA, but I put it in this context since that is how I ran into the issue. I am using R and specifically the psych package to run PCA.

If it helps to have an example, this post on StackOverflow has one that is very convenient and related (also in R). In this post, the author of the best answer shows that the PCA loadings (eigenvectors) are orthogonal by using Factor Congruence or cross-products. In his example, the matrix L is the PCA loadings matrix. The only thing that is not on this link is that cor(L) will produce the output I am asking about showing the non-zero correlations between the eigenvectors.

I am especially confused about how orthogonal vectors can be correlated after reading this post, which seems to prove that orthogonality is equivalent to lack of correlation: Why are PCA eigenvectors orthogonal and what is the relation to the PCA scores being uncorrelated?

My second question is: when the PCA eigenvectors are used to calculate PCA scores, the scores themselves are uncorrelated (as I expected)… is there a connection to my first question about this, why eigenvectors are correlated but not the scores?

Best Answer

Let $X$ be a random vector $X=(x_1,x_2,\cdots,x_d)^T $ with expected value $\mu$ and variance $\Sigma$. We are looking for such ordered vectors $u_i$, that maximize the variance of $u_i^TX$. Essentialy we are solving $$\max\limits_{u_i} Var(u_i^TX)$$ $$s.t. \quad u_i^T u_i=1.$$ Because we are only interested in the direction of such vectors, we are additionally assuming the unit length of vectors $u_i^T u_i=1$. Vectors $u_i$ are actually not random (because we are working theoretically now, in reality we are replacing the unknown $\Sigma$ and unknown $\mu$ with Empirical sample covariance matrix and mean respectively, @whuber was explaining this from a different perspective) so $$Var(u_i^TX)=u_i^T\Sigma u_i.$$ The optimization problem can be trivially solved by using the Lagrange function $$L(u_i,\lambda_i):=u_i^T \Sigma u_i -\lambda_i(u_i^Tu_i-1).$$ From there we get the necessary condition for constrained extrema $$ \frac{\partial L(u_i,\lambda_i)}{\partial u_i} = 2\Sigma u_i -2\lambda_i u_i=0,$$ which can be reduced to $$\Sigma u_i =\lambda_i u_i,$$ that is by definition the problem of eigenvalues and eigenvectors. Because $\Sigma$ is symmetric and positive semidefinite matrix, the spectral theorem applies and we are able to find orthonormal basis that satisfies $\Sigma=Q\Lambda Q^{-1}=Q\Lambda Q^T$, where $Q$ is made of orthogonal eigenvectors and $\Lambda$ is a diagonal matrix with eigenvalues which are all real.

Now we can show that $$cov(u_i^TX,u_j^TX)=u_i^T\Sigma u_j=\lambda_j u_i^Tu_j=0, \quad \forall j \neq i.$$ Trivially for $i=j: \quad cov(u_i^TX,u_j^TX)=\lambda_i.$ So not the eigenvectors, but the projections are uncorrelated.