Understand Eigenvalue/Eigenvectors of matrices, without considering linear transformations

eigenvalues-eigenvectorsmatrices

I am struggling to DEEPLY understand eigenvectors/eigenvalues. Mainly because almost everywhere they are introduced in the context of transformation where we use matrices as functions and eigenvalues and eigenvectors, similar to the roots of polynomials, help us better understand the behavior of the function/transformation. However, my question is what if we don't want to transform anything and the matrix is just a book keeping tool that hosts some information like an image or a dataset. Then what's special about the eigenvector/eigenvalues of such matrices to deserve the name "Eigen (special)"?

The answer to this question gets close when it tries explaining PCA but it doesn't explain how eigenvectors determine orientation and eigenvalues determine distortion of an ellipse!

I think matrices (whether as a function or a book keeping tool) have some secrets carved into their eigenvectors/eigenvalues and one only truly understands why those are Eigen (special) if he/she understands both WHY and WHAT secrets are hidden in them. I would really appreciate if anyone could answer these questions.

Best Answer

If you do not interpret a matrix $A\in\mathbb{C}^{n\times n}$ as a representation of a linear map $\varphi:\mathbb{C}^n\ni x\mapsto Ax\in\mathbb{C}^n$, there is not much to be said for eigenvalues and eigenvectors which define invariants of the linear map $\varphi$. Indeed, when applying the transformation on a vector in the linear span of one eigenvector, it turns out that the vector direction remains the same whereas the norm of the vector is scaled by a factor equal to the modulus of the associated eigenvalue, that is $Av_i=\lambda_iv_i$ and $||Av_i||=|\lambda_i|\cdot||v_i||$ for any pair of eigenvalue and eigenvector $(\lambda_i,v_i)$.

Eigenvectors do not really capture the internal structure of the matrix and may not be appropriate for dealing with matrices containing information and not representing a linear map.

Singular values do a much better job at representing the internal structure of a matrix. For any matrix $B\in\mathbb{C}^{n\times m}$, this matrix can be represented as

$$B=U\Sigma V^*$$

where $U\in\mathbb{C}^{n\times r}$ and $V\in\mathbb{C}^{m\times r}$ are unitary matrices and $\Sigma\in\mathbb{R}^{r\times r}$, $r=\mathrm{rank}(B)$, is a diagonal matrix with positive diagonal elements. This is called the (reduced) Singular-Values Decomposition (SVD) and the diagonal elements of $\Sigma$ are called the singular values. In the standard singular value decomposition (i.e. non-reduced) some singular values can be equal to zero and $\Sigma$ is of dimension $\min\{n,m\}$.

In fact, the singular values are the square-roots of the eigenvalues of $B^*B$ where $B^*$ denotes the transpose conjugate of $B$. In the reduced SVD, only the positive singular values are considered.

When viewed as a mapping $x\mapsto Bx$, this decomposition shows that the overall map can subdivided as three submaps $$x\stackrel{V^*}{\mapsto} y \stackrel{\Sigma}{\mapsto} z \stackrel{U}{\mapsto} Bx.$$

  • The first map consists of a rotation of the vector which preserves its norm (due to the fact that $V$ is a unitary matrix) and adapting its dimension to that of the image space of $B$.
  • The second map consists of a scaling of each of the component of the vector by a positive factor.
  • The third and last map consists again of a rotation of the vector which preserves its norm and adapts its dimension to fit the co-domain of the map.

So far, this is again an interpretation in terms of $B$ being seen as the representation of a linear transformation. However, expanding the expression $B=U\Sigma V^*$ yields

$$B=\sum_{i=1}^r\sigma_iu_iv_i^*$$

where $u_i$ and $v_i$ are the $i$-th columns of $U$ and $V$, respectively. The terms $\sigma_i$ are the singular values.

From that decomposition, one can see that matrix itself can be represented in terms of a positively weighted sum of rank one matrices where those weights are exactly the singular values. In this regard, the importance or the contribution of a rank one matrix can be measured in terms of the magnitude of the corresponding singular value.

This is what principal component analysis is doing. You look at the singular values and you pick the rank one matrices that explain most of the data by picking those for which the singular values are the largest. An interesting way to see this is to look at the following problem:

$$\min_{\mathrm{rank}(X)=r}||X-B||_2$$

where $\mathrm{rank}(B)>r$, that aims at finding the best approximation of rank $r$ to the matrix $B$ in the spectral norm sense. It is defined here as $||X-B||_2=\max_i\sigma_i(X-B)$, that is, the maximal singular value of the difference $X-B$. So, in other words, we want to find $X$ of rank $r$ that minimizes the maximal singular value of $X-B$.

We can decompose $B$ as $B=\sum_{i=1}^r\sigma_iu_iv_i^*$ and assume that $\sigma_1\ge\sigma_2\ge...$.

  • So, if $X$ is of rank one, one can see that the best solution is to pick $X=\sigma_1u_1v_1^*$ and that in this case we would have that $||X-B||_2=\sigma_2$.
  • if $X$ is now of rank 2, then out best choice is to pick $X=\sigma_1u_1v_1^*+\sigma_2u_2v_2^*$ and that in this case we would have that $||X-B||_2=\sigma_3$.
  • $\qquad\vdots$
  • Finally, if $X$ is of rank r, then out best choice is to pick $X=\sum_{i=1}^r\sigma_iu_iv_i^*$ and that in this case we would have that $||X-B||_2=\sigma_{r+1}$.

There is a class of matrices for which the eigenvalues give some information about the internal structure of the matrix in a way that is analogous to singular values. This class is that of Hermitian positive semidefinite matrices $A$ which satisfies $A=A^*$ where $A^*$ denotes the transpose conjugate of $A$ and for which all the eigenvalues are nonnegative. For real matrices, this definition reduces to symmetric positive semidefinite matrices.

It is known that Hermitian matrices are diagonalizable. Moreover, the basis under which the matrix is diagonal is spanned by vectors which are orthogonal with each others. In other words, the basis $P$ is such that $P^*P=I$ or, in other words, we have that $P^{-1}=P^*$

So we have that $$A=P^{-1}DP=P^{*}DP$$ where $D$ is the diagonal matrix containing the (real and nonnegative) eigenvalues of $A$ on the diagonal.

The singular values of $A=A^*$ are the square-roots of the eigenvalues of $A^*A=A^2$ and we have in this case that $\Sigma=D^2$. So, this means that the eigenvalues and the singular values coincide with each other in this case. As a result, the eigenvalues convey the same level of information here as the singular values.

One can then consider the following eigenvalue decomposition

$$A=\sum_{i=1}^n\lambda_iu_i^*u_i$$ where $u_i$ is the $i$-th row of $P$ which coincides in this case with the SVD.

The same analysis can be carried out for general Hermitian matrices. In this case, the singular values would just be the absolute values of the eigenvalues.

So, one can see that in this case, the eigenvalues and eigenvectors actually do contain structural information about the matrix. This is due to the fact that the eigenvalues here are a suitable concept for defining a norm for the matrix as they coincide with the singular values.

It is important to stress that this is not the case for general square matrices for which eigenvalues (or their absolute values) are a very poor concept for quantifying or representing the contents of a matrix. For instance, the eigenvalues of the matrix

$$\begin{bmatrix}0 & 1\\0 & 0\end{bmatrix}$$

are both equal to 0 whereas the singular values are 1 and 0. It is clear that in this case an eigenvalue decomposition would not make any sense as we would not be able to make the distinction between the above matrix and the zero matrix.

Related Question