Understand Eigenvalue/Eigenvectors of matrices, without considering linear transformations

eigenvalues-eigenvectorsmatrices

I am struggling to DEEPLY understand eigenvectors/eigenvalues. Mainly because almost everywhere they are introduced in the context of transformation where we use matrices as functions and eigenvalues and eigenvectors, similar to the roots of polynomials, help us better understand the behavior of the function/transformation. However, my question is what if we don't want to transform anything and the matrix is just a book keeping tool that hosts some information like an image or a dataset. Then what's special about the eigenvector/eigenvalues of such matrices to deserve the name "Eigen (special)"?

The answer to this question gets close when it tries explaining PCA but it doesn't explain how eigenvectors determine orientation and eigenvalues determine distortion of an ellipse!

I think matrices (whether as a function or a book keeping tool) have some secrets carved into their eigenvectors/eigenvalues and one only truly understands why those are Eigen (special) if he/she understands both WHY and WHAT secrets are hidden in them. I would really appreciate if anyone could answer these questions.

Best Answer

If you do not interpret a matrix $A\in\mathbb{C}^{n\times n}$ as a representation of a linear map $\varphi:\mathbb{C}^n\ni x\mapsto Ax\in\mathbb{C}^n$, there is not much to be said for eigenvalues and eigenvectors which define invariants of the linear map $\varphi$. Indeed, when applying the transformation on a vector in the linear span of one eigenvector, it turns out that the vector direction remains the same whereas the norm of the vector is scaled by a factor equal to the modulus of the associated eigenvalue, that is $Av_i=\lambda_iv_i$ and $||Av_i||=|\lambda_i|\cdot||v_i||$ for any pair of eigenvalue and eigenvector $(\lambda_i,v_i)$.

Eigenvectors do not really capture the internal structure of the matrix and may not be appropriate for dealing with matrices containing information and not representing a linear map.

Singular values do a much better job at representing the internal structure of a matrix. For any matrix $B\in\mathbb{C}^{n\times m}$, this matrix can be represented as

$$B=U\Sigma V^*$$

where $U\in\mathbb{C}^{n\times r}$ and $V\in\mathbb{C}^{m\times r}$ are unitary matrices and $\Sigma\in\mathbb{R}^{r\times r}$, $r=\mathrm{rank}(B)$, is a diagonal matrix with positive diagonal elements. This is called the (reduced) Singular-Values Decomposition (SVD) and the diagonal elements of $\Sigma$ are called the singular values. In the standard singular value decomposition (i.e. non-reduced) some singular values can be equal to zero and $\Sigma$ is of dimension $\min\{n,m\}$.

In fact, the singular values are the square-roots of the eigenvalues of $B^*B$ where $B^*$ denotes the transpose conjugate of $B$. In the reduced SVD, only the positive singular values are considered.

When viewed as a mapping $x\mapsto Bx$, this decomposition shows that the overall map can subdivided as three submaps $$x\stackrel{V^*}{\mapsto} y \stackrel{\Sigma}{\mapsto} z \stackrel{U}{\mapsto} Bx.$$

The first map consists of a rotation of the vector which preserves its norm (due to the fact that $V$ is a unitary matrix) and adapting its dimension to that of the image space of $B$.
The second map consists of a scaling of each of the component of the vector by a positive factor.
The third and last map consists again of a rotation of the vector which preserves its norm and adapts its dimension to fit the co-domain of the map.

So far, this is again an interpretation in terms of $B$ being seen as the representation of a linear transformation. However, expanding the expression $B=U\Sigma V^*$ yields

$$B=\sum_{i=1}^r\sigma_iu_iv_i^*$$

where $u_i$ and $v_i$ are the $i$-th columns of $U$ and $V$, respectively. The terms $\sigma_i$ are the singular values.

From that decomposition, one can see that matrix itself can be represented in terms of a positively weighted sum of rank one matrices where those weights are exactly the singular values. In this regard, the importance or the contribution of a rank one matrix can be measured in terms of the magnitude of the corresponding singular value.

This is what principal component analysis is doing. You look at the singular values and you pick the rank one matrices that explain most of the data by picking those for which the singular values are the largest. An interesting way to see this is to look at the following problem:

$$\min_{\mathrm{rank}(X)=r}||X-B||_2$$

where $\mathrm{rank}(B)>r$, that aims at finding the best approximation of rank $r$ to the matrix $B$ in the spectral norm sense. It is defined here as $||X-B||_2=\max_i\sigma_i(X-B)$, that is, the maximal singular value of the difference $X-B$. So, in other words, we want to find $X$ of rank $r$ that minimizes the maximal singular value of $X-B$.

We can decompose $B$ as $B=\sum_{i=1}^r\sigma_iu_iv_i^*$ and assume that $\sigma_1\ge\sigma_2\ge...$.

So, if $X$ is of rank one, one can see that the best solution is to pick $X=\sigma_1u_1v_1^*$ and that in this case we would have that $||X-B||_2=\sigma_2$.
if $X$ is now of rank 2, then out best choice is to pick $X=\sigma_1u_1v_1^*+\sigma_2u_2v_2^*$ and that in this case we would have that $||X-B||_2=\sigma_3$.
$\qquad\vdots$
Finally, if $X$ is of rank r, then out best choice is to pick $X=\sum_{i=1}^r\sigma_iu_iv_i^*$ and that in this case we would have that $||X-B||_2=\sigma_{r+1}$.

There is a class of matrices for which the eigenvalues give some information about the internal structure of the matrix in a way that is analogous to singular values. This class is that of Hermitian positive semidefinite matrices $A$ which satisfies $A=A^*$ where $A^*$ denotes the transpose conjugate of $A$ and for which all the eigenvalues are nonnegative. For real matrices, this definition reduces to symmetric positive semidefinite matrices.

It is known that Hermitian matrices are diagonalizable. Moreover, the basis under which the matrix is diagonal is spanned by vectors which are orthogonal with each others. In other words, the basis $P$ is such that $P^*P=I$ or, in other words, we have that $P^{-1}=P^*$

So we have that $$A=P^{-1}DP=P^{*}DP$$ where $D$ is the diagonal matrix containing the (real and nonnegative) eigenvalues of $A$ on the diagonal.

The singular values of $A=A^*$ are the square-roots of the eigenvalues of $A^*A=A^2$ and we have in this case that $\Sigma=D^2$. So, this means that the eigenvalues and the singular values coincide with each other in this case. As a result, the eigenvalues convey the same level of information here as the singular values.

One can then consider the following eigenvalue decomposition

$$A=\sum_{i=1}^n\lambda_iu_i^*u_i$$ where $u_i$ is the $i$-th row of $P$ which coincides in this case with the SVD.

The same analysis can be carried out for general Hermitian matrices. In this case, the singular values would just be the absolute values of the eigenvalues.

So, one can see that in this case, the eigenvalues and eigenvectors actually do contain structural information about the matrix. This is due to the fact that the eigenvalues here are a suitable concept for defining a norm for the matrix as they coincide with the singular values.

It is important to stress that this is not the case for general square matrices for which eigenvalues (or their absolute values) are a very poor concept for quantifying or representing the contents of a matrix. For instance, the eigenvalues of the matrix

$$\begin{bmatrix}0 & 1\\0 & 0\end{bmatrix}$$

are both equal to 0 whereas the singular values are 1 and 0. It is clear that in this case an eigenvalue decomposition would not make any sense as we would not be able to make the distinction between the above matrix and the zero matrix.

Related Solutions

[Math] Eigenvalues and “Eigenvectors” of Linear Transformations

An eigenvector of a linear tranformation $T\colon\mathbf{V}\to\mathbf{V}$ is a vector $\mathbf{v}$, $\mathbf{v}\neq\mathbf{0}$, such that there is a scalar $\lambda$ for which $T(\mathbf{v})=\lambda \mathbf{v}$. The scalar $\lambda$ is called an eigenvalue. $T$ is diagonalizable if and only if you can find a basis for $\mathbf{V}$ in which every vector in the basis is an eigenvector. This does not mean that any basis will consist of eigenvectors. It means that there is a way to pick a basis so that every element of the basis is an eigenvector.

So, here, $\mathbf{V}=\mathbf{P}_2$. Your function $T$ is defined by the given formula. For example, if $p(x) = x^2 - 3x + 1$, then $p(-x) = (-x)^2 -3(-x) + 1 = x^2 +3x + 1$, so $T(p) = x^2+3x+1$.

You are looking for eigenvalues and eigenvectors. If you already know about the characteristic polynomial, then you should figure out the characteristic polynomial of $T$ (say, using the standard basis to find a coordinate matrix for $T$) and proceed along those lines.

If you don't know about the characteristic polynomial, not all is lost. Suppose $p(x) = ax^2+bx+c$ is an eigenvector. That means that $a$, $b$, and $c$ are not all zero, and there is a scalar $\lambda$ such that $T(p)=\lambda p$. Since $T(p(x)) = ax^2 -bx +c$ and $\lambda p = \lambda ax^2 + \lambda bx + \lambda c$, the equation says that $$ax^2 - bx + c = T(p) = \lambda p = \lambda ax^2 +\lambda bx + \lambda c.$$ For this to happen, you must have: \begin{align*} a &= \lambda a\\ -b &= \lambda b\\ c &=\lambda c \end{align*} and $a$, $b$, and $c$ not all zero. You want to find all values of $\lambda$ for which there are soluctions $a,b,c$ not all zero. This is not terribly hard to do. Try that, and we can take it from there; edit your question if you get stuck to account for how far you have managed to go then.

The starting vector $x_1$ for construction of symmetric tridiagonal matrix using Lanczos algorithm

The starting vectors in most of the eigenvalue algorithms are initialized with a random vector with entries $X \sim N(0,1)$ If you look in Trefethen. You should note that they randomly initialize most of the algorithms in numerical linear algebra.

It says "arbitrary" but what it means it is random. Some code follows after an explanation. The Lanzcos algorithm is in a suite of algorithms called the Krylov methods. The Krylov methods are a specific type of iterative eigenvalue algorithm as opposed to direct methods. The Krylov methods are listed below.

To explain how Lanczos works I am going to explain how Arnoldi works. The basis of both of these eigenvalue methods is first to use a similarity transform reducing to Hessenberg form. That is the following

$$A = QHQ^{*} $$

or $$AQ = QH $$

Instead, we compute $Q_{n} $ letting it be a $m \times n$ matrix whose first $n$ columns are of $Q$

Additionally, we have

In the book it calls it Hessenberg matrix however my professor said it is "hessenbergish". We end up with

$$ A Q_{n} = Q_{n+1} \bar{H_{n+1}} $$

The idea is simple that we are actually building Krylov subspaces

$$ \mathbf{K_{n}} = \langle b, Ab, \cdots, A^{n-1}b \rangle = \langle q_{1} ,q_{2} , \cdots q_{n} \rangle $$

We form these throught the QR decomp

$$ K_{n} = Q_{n}R_{n}$$

Now the we had $H_{n+1}$ to remove this row we note the following. $ Q_{n}^{*}Q_{n+1}$ is the $n \times (n+1)$ identity then $Q_{n}^{*}Q_{n+1}\bar{H_{n}}$ is $ n \times n$ Hessenberg matrix obtained by removing the last row.

$$ H_{n} = Q_{n}^{*}AQ_{n} $$

Now if you see we are iteratively building our projections through these Krylov subspaces.

The Lanczos iteration is simply when the matrix is symmetric.

function [T varargout] = myLanczos(mat,varargin)
%--------------------------------------------------------------------------
% Syntax:       T = myLanczos(mat);
%               T = myLanczos(mat,k);
%               T = myLanczos(mat,v);
%               T = myLanczos(mat,k,v);
%               [T Q] = myLanczos(mat);
%               [T Q] = myLanczos(mat,k);
%               [T Q] = myLanczos(mat,v);
%               [T Q] = myLanczos(mat,k,v);
%               
% Inputs:       mat is a symmetric or Hermitian N x N matrix
%
%               k is the number of Lanczos iterations to perform. The
%               default value is N
%
%               v is an N x 1 vector (internally forced to have unit norm)
%               specifying a particular starting vector to use. The default
%               vector is chosen uniformly at random from the unit sphere
%
% Outputs:      T is a k x k symmetric tridiagonal matrix. When k < N, T is
%               approximately similar to mat up to rank k. When k = N, T is
%               similar (in the linear algebra sense) to mat. That is, when
%               k = N, T has identical eigenvalues to mat
%
%               Q is the N x k similarity transformation such that
%               mat = Q * T * Q'. Note that equality holds only when k = N.
%               For k < N, mat ~ Q * T * Q'
%
% Description:  This function uses the Lanczos algorithm with full
%               reorthogonalization to compute k x k symmetric tridiagonal
%               matrix T that approximates mat up to rank k with respect to
%               transformation Q. That is, mat = Q * T * Q'. Note that
%               equality holds only for k = N. For k < N, mat ~ Q * T * Q'
%
%               NOTE: In particular, when k = N, T has identical
%               eigenvalues to mat
%
% Author:       Brian Moore
%               [email protected]
%
% Date:         July 5, 2013
%--------------------------------------------------------------------------

% Knobs
symTol = 1e-8;

% Check input matrix size
[m,n] = size(mat);
if (m ~= n)
    error('Input matrix must be square');
end

% Make sure input matrix is symmetric
if max(max(abs(mat - mat'))) > symTol
    error('Input matrix is not symmetric to working precision');
end

% Parse user inputs
if (nargin == 2)
    if (length(varargin{1}) == n)
        k = n;
        v = varargin{1};
    else
        k = varargin{1};
        v = randn(n,1);
    end
elseif (nargin == 3)
        k = varargin{1};
        v = varargin{2};
else
    k = n;
    v = randn(n,1);
end

% Initialize variables
Q = nan(n,k);
q = v / norm(v);
Q(:,1) = q;
d = nan(k,1);
od = nan(k-1,1);

% Perform Lanczos iterations
for i = 1:k
    z = mat * q;
    d(i) = q' * z;

    %----------------------------------------------
    % Uncomment only one of the following 3 options
    %----------------------------------------------
    % Option 1: Full re-orthogonalization
    %z = z - Q(:,1:i) * (Q(:,1:i)' * z);

    % Option 2: Full re-orthogonalization (x2)
    z = z - Q(:,1:i) * (Q(:,1:i)' * z);
    z = z - Q(:,1:i) * (Q(:,1:i)' * z);

    % Option 3: No re-orthogonalization
    %z = z - d(i) * q;
    %if (i > 1)
    %    z = z - od(i-1) * Q(:,i-1);
    %end
    %----------------------------------------------

    if (i ~= k)
        od(i) = norm(z);
        q = z / od(i);
        Q(:,i + 1) = q;
    end
end

% Construct T
T = diag(d) + diag(od,-1) + diag(od,1);

% Return user-requested information
if (nargout == 2)
    varargout{1} = Q;
end

if you note if v is not specified it is these lines

else
        k = n;
        v = randn(n,1);
    end

    % Initialize variables
    Q = nan(n,k);
    q = v / norm(v);

It is constructing an orthogonal eigenvector. $q_{1}$ and each one after that from it.

Best Answer

Related Solutions

[Math] Eigenvalues and “Eigenvectors” of Linear Transformations

The starting vector $x_1$ for construction of symmetric tridiagonal matrix using Lanczos algorithm

Related Question