Solved – Why does the direction with highest eigenvalue have the largest semi-axis

eigenvaluespca

So, in PCA, we decompose the covariance matrix by its eigenvalues and eigenvectors. I understand that an ellipsoid is fully characterized by the eigenvalues and eigenvectors of a positive definite matrix $A$, and it has an equation $(x-x_c)^T A (x-x_c) =1$, where $x_c \in \mathbb{R}^n$ denotes the center of the ellipsoid. In PCA we are basically trying to fit an ellipsoid to the data from what I understand.

I also understand that $i$-th principal direction of the ellipsoid is given by the direction of the $i-$th eigenvector $v_i$ associated with the eigenvalue $\lambda_i$, sorted out so that $\lambda_1 > \ldots > \lambda_n > 0$. And finally, $\lambda_i$ defines the inverse square of the associated semi-axis, so $r_i = \frac{1}{\sqrt{\lambda_i}}$ is the semi-axis of the $i$-th principal direction, right? Now, wouldn't this mean that $r_1$ is the smallest of all other $r_j$? But that can't make sense I think, because then why would we project our data in that direction over any other if we want to reduce the dimension of our data?

I know there has to be something wrong in my reasoning, I'm in no way suggesting PCA is wrong, but I would appreciate some clarification.

Best Answer

I think there are two ellipses that we could consider. First, consider the image of the unit circle with respect to the map $x \mapsto x^T A x$ for PD $A \in \mathbb R^{n \times n}$. It is a standard result that $f(x) = x^T A x$ is maximized over unit vectors $x$ by the unit eigenvector $v_1$ with largest eigenvalue $\lambda_1$. So this means that the ellipse formed by the image of the unit circle with respect to this map has its largest semi-axis as $v_1$ with length $\lambda_1$, and so on for the other eigenpairs. So in this case we clearly have that the eigenvalues give the lengths of the semi-axes and the biggest semi-axis is for the biggest eigenvalue.

But now consider the contour $f(x) = 1$ for any $x \in \mathbb R^n$. Since $A$ is positive-definite we know that $f$ is a paraboloid, so its intersection with a horizontal plane (i.e. the contour) is an ellipse. In this case, we find that the shortest semi-axis is parallel to $v_1$, which makes sense because that's the direction that $f$ grows the fastest so we hit 1 the soonest. The largest semi-axis is $v_n$ since that's the direction in which $f$ is growing the slowest. Plugging $v_1$ in to $f$ we get $f(v_1) = v_1^T A v_1 = \lambda_1 v_1^T v_1 = \lambda_1$, not 1 as required, so the actual vector parallel to $v_1$ is $\frac{1}{\sqrt \lambda_1}v_1$. Does this help?

Bringing this back to PCA, let's say that our data consists of $m$ observations coming iid from $\mathcal N_2(\vec 0, \Sigma)$. Let's draw an ellipse around our data such that every point in the ellipse has likelihood greater than some cutoff $c$. This corresponds to a contour of the likelihood and can be found by $$ c = \frac{1}{2\pi \vert \Sigma \vert}\exp \left( -\frac12 x^T \Sigma^{-1} x\right) \iff x^T \Sigma^{-1} x = -2\log (2\pi c \vert \Sigma \vert) $$ i.e. the ellipse that circles the data is a contour of the quadratic form $g(x) = x^T \Sigma^{-1} x$. Note that $\Sigma^{-1}v = \lambda v \implies \frac1\lambda v = \Sigma v$, so the eigenvectors defining the axes of this likelihood contour ellipse are the same as those for the covariance matrix $\Sigma$ but with inverted eigenvalues.

Related Question