How to Intuitively Understand Eigenvalue and Eigenvector

eigenvalues-eigenvectorsintuitionlinear algebrastatistics

I'm learning multivariate analysis and I have learnt linear algebra for two semester when I was a freshman.

Eigenvalue and eigenvector is easy to calculate and the concept is not difficult to understand.I found that there are many application of eigenvalue and eigenvector in multivariate analysis. For example

In principal components, proportion of total population variance due
to kth principal component equal
$$\frac{\lambda_k}{\lambda_1+\lambda_2+…\lambda_k}$$

I think eigenvalue product corresponding eigenvector has same effect as the matrix product eigenvector geometrically.

I think my former understanding may be too naive so that I cannot find the link between eigenvalue and its application in principal components and others.

I know how to induce almost every step form the assumption to the result mathematically. I'd like to know how to intuitively or geometrically understand eigenvalue and eigenvector in the context of multivariate analysis(in linear algebra is also good).

Thank you!

Best Answer

Personally, I feel that intuition isn't something which is easily explained. Intuition in mathematics is synonymous with experience and you gain intuition by working numerous examples. With my disclaimer out of the way, let me try to present a very informal way of looking at eigenvalues and eigenvectors.

First, let us forget about principal component analysis for a little bit and ask ourselves exactly what eigenvectors and eigenvalues are. A typical introduction to spectral theory presents eigenvectors as vectors which are fixed in direction under a given linear transformation. The scaling factor of these eigenvectors is then called the eigenvalue. Under such a definition, I imagine that many students regard this as a minor curiosity, convince themselves that it must be a useful concept and then move on. It is not immediately clear, at least to me, why this should serve as such a central subject in linear algebra.

Eigenpairs are a lot like the roots of a polynomial. It is difficult to describe why the concept of a root is useful, not because there are few applications but because there are too many. If you tell me all the roots of a polynomial, then mentally I have an image of how the polynomial must look. For example, all monic cubics with three real roots look more or less the same. So one of the most central facts about the roots of a polynomial is that they ground the polynomial. A root literally roots the polynomial, limiting it's shape.

Eigenvectors are much the same. If you have a line or plane which is invariant then there is only so much you can do to the surrounding space without breaking the limitations. So in a sense eigenvectors are not important because they themselves are fixed but rather they limit the behavior of the linear transformation. Each eigenvector is like a skewer which helps to hold the linear transformation into place.

Very (very, very) roughly then, the eigenvalues of a linear mapping is a measure of the distortion induced by the transformation and the eigenvectors tell you about how the distortion is oriented. It is precisely this rough picture which makes PCA very useful.

Suppose you have a set of data which is distributed as an ellipsoid oriented in $3$-space. If this ellipsoid was very flat in some direction, then in a sense we can recover much of the information that we want even if we ignore the thickness of the ellipse. This what PCA aims to do. The eigenvectors tell you about how the ellipse is oriented and the eigenvalues tell you where the ellipse is distorted (where it's flat). If you choose to ignore the "thickness" of the ellipse then you are effectively compressing the eigenvector in that direction; you are projecting the ellipsoid into the most optimal direction to look at. To quote wiki:

PCA can supply the user with a lower-dimensional picture, a "shadow" of this object when viewed from its (in some sense) most informative viewpoint

Related Solutions

[Math] Soft question: Why freshmen feel linear algebra is abstract

I am surprised both by the approach of your textbook (you don't need determinants to introduce the distinction between singlar and non-singular matrices, nor to solve linear systems), and by the fact that you qualify this approach as abstract. I would qualify a don't-ask-questions-just-compute attitude as concrete rather than abstract. Maybe you use "abstract" to mean "hard to grasp", but it is not the same thing; for me often the things hardest to grasp are complicated but very concrete systems (in biochemistry for instance). In mathematics (and elsewhere, I suppose) it is often asking conceptual questions that leads to abstraction, and I sense that what you would like is a more conceptual, and therefore more abstract approach.

But abstraction is present in many fields of mathematics, like linear algebra, for a more improtant reason as well, namely for the sake of economy and generality. Linear algebra arose as a set of common techniques that apply to problems in very diverse areas of mathematics, and only by an abstract formulation can one express them in such a way that they can be applied whereever needed, without having to reformulate them in each concrete situation. It would be motivating to have seen at least one such concrete application area before entering the abstraction of the subject, and I think that would be a sound approach. However this would involve introducing many details that in the end are independent of the methods of linear algebra, and I guess there is often just not the time to go into such preparations.

So to answer your questions.

Linear algebra is an abstract subject, so it should not surprise tht freshmen feel it is so. But it is not abstract because of determinants, which are just a concrete tool that allows certain things to be expressed more explicitly than without them. Saying a linear map is invertible is a more abstract formulation then saying $\det A\neq0$ where $A$ is a matrix of that linear map in some basis.
Yes, geometric insight helps understanding linear algebra, and you should have some geomtric intuition for notions as subspaces, span, kernels, images, eigenvalues. But determinants are somewhat different; while you certainly should have some geometric intuition for determinants in terms of volume when doing calculus, there is not much to gain from this in purely algebraic situations, and in fact I would know no geometric interpretation at all of the determinant of a complex matrix, or of the determinant that defines the characterisitic polynomial.
To understand linear algebra better, you should try to go beyond concrete computational questions, and try obtain a more conceptual understanding of what is being done.

As for the mysteries of determinants, you may want to have a deeper understanding than just that they exists and magically solve certain problems (like determining which square matrices are invertible). For that I would refer to this question.

[Math] Dominant eigenvector by looking at rows of matrix raised to a power

The example matrix is: $$ M = \left( \begin{matrix} 3 & 2 & 6 \\ 2 & 2 & 5 \\ 6 & 5 & 4 \end{matrix} \right) $$

It has these eigenvalues and eigenvectors: $$ V = \left( \begin{matrix} 0.518736 & 0.647720 & 0.558007 \\ 0.462052 & -0.761559 & 0.454463 \\ -0.719320 & -0.022082 & 0.694328 \end{matrix} \right) \,\, \Lambda = \left( \begin{matrix} -3.53862 & 0 & 0 \\ 0 & 0.44394 & 0 \\ 0 & 0 & 12.09467 \end{matrix} \right) $$ so that $$ V^{-1} M V = \Lambda $$ The dominant eigenvalue $\lambda_1 = \Lambda_{33}$ is the last one and indeed the iteration given in the example seems to converge towards $v_{\lambda_1} = V e_3 = v_3$.

Note: I used GNU octave for this calculation

  M = [3,2,6;2,2,5;6,5,4]
  [V,D] = eig(M)
  inv(V) * M * V 
  e1 = eye(3)(:,1)

Further $$ \frac{M^{20}}{\min(M^{20})} = \left( \begin{matrix} \frac{1}{0.37013} v_{\lambda_1} & \frac{1}{0.45446} v_{\lambda_1} & \frac{1}{0.29746} v_{\lambda_1} \end{matrix} \right) $$ as you gave in your update.

  function [ret] = mymin(A)
    s = size(A);
    min = A(1,1);
    for i=1:s(1)
      for j=1:s(2)
        a = A(i,j);
        if (a < min)
          min = a;
        endif
      endfor
    endfor
    ret = min;
  endfunction

  M^20/mymin(M^20)

So why do we see a multiple of $v_{\lambda_1}$?

I still believe the calculation of $M^N$, especially the $k$-th column $$ M^N e_k $$

is similar to a power iteration for $M$ with start vector $e_k$: $$ r_N = \frac{M^N e_k}{||M^N e_k||} \to v_{\lambda_1} \quad (\#) $$ the factor then relates to $||M^N e_k||$. Using $(\#)$ on term $(*)$ gives $$ \frac{M^N e_k}{\min(M^N)} \to \frac{||M^N e_k||}{\min(M^N)} v_{\lambda_1} = \frac{1}{\min(M^N) \, / \, ||M^N e_k||} v_{\lambda_1} $$

Doing the calculation: $$ \frac{\min(M^{20})}{||M^{20} e_1||_2} = \frac{(M^{20})_{22}}{||M^{20} e_1||_2} = \frac{9.2657e+20}{2.5034e+21} = 0.37013 \quad (\#\#) $$ voila. Using start vectors $e_2$ and $e_3$ with $(\#\#)$ gives the other two factors.

Update: It was asked for what matrices this behaviour occurs.

IMHO it works for those matrices $M$ where $(\#)$ converges, and that is according to the Wikipedia article (and the proof given there):

$M$ has an eigenvalue that is strictly greater in magnitude than its other eigenvalues. This is the case for the example, $|\lambda_1| > |\Lambda_{11}| > |\Lambda_{22}|$ and
the starting vector $e_k$ has a nonzero component in the direction of an eigenvector associated with the dominant eigenvalue. True as well for the example, where the eigenvectors, especially the dominant $v_{\lambda_1}$, have no zero coordinates, so $e_k v_{\lambda_1} \ne 0$.
$M$ should be diagonizable. This is the case for the example because a symmetric matrix is diagonizable.

Update: To show that $(*)$ converges iff $(\#)$ converges, it would be helpful to show that $$ m_1 \min(A) \le ||A e_k|| \le m_2 \min(A) $$ The first constant is $m_1 = 1$. The second constant does not exist, as the left side is not negative and the minimum matrix element might be negative.

So $(\#) \le (*)$ and "$\Rightarrow$" holds.

The other direction might still be true, but I don't have proof for it.

Update: If $M$ fulfills the above three conditions, the power iteration $(\#)$ converges and we have: $$ ||M^N e_k|| \approx |w_{dk}| \, \left|\lambda_1\right|^N \quad (\#\#\#) $$ and $$ M^N e_k \approx w_{dk} \, \lambda_1^N v_{\lambda_1} \quad (\$) $$ where $V^{-1} = (w_{ij})$ is the inverse of the eigenvector matrix and $d$ is the column of the dominant eigenvector $v_{\lambda_1}$ in $V$.

This follows from the proof in the Wikipedia article on power iteration (see above). It is roughly: $$ \begin{align} M^N e_k &= M^N (c_1 v_1 + \cdots + c_n v_n) \\ &= (c_1 \lambda_1^N v_1 + \cdots + c_n \lambda_n^N v_n) \\ &= c_d \lambda_d^N v_d + \lambda_d^N\sum_{k\ne d} c_k \underbrace{\left(\frac{\lambda_k}{\lambda_d}\right)^N}_{\to 0} v_k \\ &\to c_d \lambda_d^N v_d \end{align} $$ $$ e_k = c_1 v_1 + \cdots + c_n v_n = V c \iff c = V^{-1} e_k \iff c_i = w_{ik} $$ $$ ||M^N e_k|| \to ||c_d \lambda_d^N v_{\lambda_d}|| = |w_{dk}| |\lambda_d|^N $$

Equation $(\$)$ gives $$ M^k = \left( \begin{matrix} w_{d1} \lambda_1^N v_{\lambda_{1}} & w_{d2} \lambda_1^N v_{\lambda_{1}} & w_{d3} \lambda_1^N v_{\lambda_{1}} \end{matrix} \right) $$ this allows us to calculate $\min(M^k)$ directly: $$ \min(M^k) = \min_{i,j} w_{dj} \lambda_1^N v_{id} = \lambda_1^N \min_{i,j} v_{id} w_{dj} \quad (\$\$) $$ which implies that $$ \frac{M^N}{\min(M^N)} \to \\ \left( \begin{matrix} \frac{1}{(\min_{i,j} v_{id} w_{dj}) \, / w_{d1}} v_{\lambda_{1}} & \frac{1}{(\min_{i,j} v_{id} w_{dj}) \, / w_{d2}} v_{\lambda_{1}} & \frac{1}{(\min_{i,j} v_{id} w_{dj}) \, / w_{d2}} v_{\lambda_{1}} & \end{matrix} \right) \quad (\$\$\$) $$ So "$\Leftarrow$" holds as well and we have that $(*)$ converges iff $(\#)$ converges.

For the example matrix this gives: The dominant eigenvector resides at column $d = 3$ in $V$. The inverse of $V$ is: $$ V^{-1} = \left( \begin{matrix} 0.518736 & 0.462052 & -0.719320 \\ 0.647720 & -0.761559 & -0.022082 \\ 0.558007 & 0.454463 & 0.694328 \end{matrix} \right) $$ note that $V^{-1} = V^T$ here, thus $V$ is orthogonal, because $M$ is real and symmetric. Then we have $$ (v_{i3}) = \left( \begin{matrix} 0.55801 \\ 0.45446 \\ 0.69433 \\ \end{matrix} \right) \quad (w_{3j}) = \left( \begin{matrix} 0.55801 & 0.45446 & 0.69433 \end{matrix} \right) $$ and for all combinations $$ (v_{i3}) \, (w_{3j}) = \left( \begin{matrix} 0.31137 & 0.25359 & 0.38744 \\ 0.25359 & 0.20654 & 0.31555 \\ 0.38744 & 0.31555 & 0.48209 \end{matrix} \right) $$ the minimum is $0.20654$. This gives $$ \left( \frac{\min_{ij} v_{i3} \, w_{3j}}{w_{dk}} \right) = \left( \begin{matrix} 0.37013 & 0.45446 & 0.29746 \end{matrix} \right) $$

Update: An interesting property of $(\$\$)$ is that the minimization does not depend on the iteration step $N$. Or in other words, independent of $N$, the minimum is always picked from the same element position of the matrix, here the one at row $2$, column $2$.

Best Answer

Related Solutions

[Math] Soft question: Why freshmen feel linear algebra is abstract

[Math] Dominant eigenvector by looking at rows of matrix raised to a power

Related Question