Solved – Does a correlation matrix of two variables always have the same eigenvectors

correlation matrixpcasvdtotal least squares

I perform Principal Component Analysis using two variables that are standardized. This is done by applying a SVD on the correlation matrix of the concerned variates. However, the SVD gives me the same eigenvector (weights) irrespective of what the two variables are. It's always [.70710678, .70710678]. I find this strange. Of course, the eigenvalues differ.

My question is: How to interpret this?


PS. I wanted to conduct a total least squares regression on two variables. My statistical programme does not provide TLS, but TLS luckily equals Principal Component Analysis, as far as I know. Hence my question. The question is not about TLS directly, but why I get the same eigenvectors irrespective of which variables I use (as long as they are exactly 2).

Best Answer

Algebraically, correlation matrix for two variables looks like that: $$\begin{pmatrix} 1 & \rho \\ \rho & 1 \end{pmatrix}.$$ Following the definition of an eigenvector, it is easy to verify that $(1, 1)$ and $(-1, 1)$ are the eigenvectors irrespective of $\rho$, with eigenvalues $1+\rho$ and $1-\rho$. For example:

$$\begin{pmatrix} 1 & \rho \\ \rho & 1 \end{pmatrix}\begin{pmatrix}1\\1\end{pmatrix}=(\rho+1)\begin{pmatrix}1\\1\end{pmatrix}.$$

Normalizing these two eigenvectors to unit length yields $(\sqrt{2}/2, \sqrt{2}/2)$ and $(-\sqrt{2}/2, \sqrt{2}/2)$, as you observed.

Geometrically, if the variables are standardized, then the scatter plot will always be stretched along the main diagonal (which will be the 1st PC) if $\rho>0$, whatever the value of $\rho$ is:

Two standardized variables with various correlation coefficients

Regarding TLS, you might want to check my answer in this thread: How to perform orthogonal regression (total least squares) via PCA? As should be pretty obvious from the figure above, if both your $x$ and $y$ are standardized, then the TLS line is always a diagonal. So it hardly makes sense to perform TLS at all! However, if the variables are not standardized, then you should be doing PCA on their covariance matrix (not on their correlation matrix), and the regression line can have any slope.


For a discussion of the case of three dimensions, see here: https://stats.stackexchange.com/a/19317.