Computing the smallest eigenvalue of a positive definite matrix $\bf{A}$ without using $\bf{A^{-1}}$

eigenvalues-eigenvectorslinear algebranumerical linear algebranumerical methods

I want to numerically compute the smallest eigenvalue of an $n \times n$ positive definite matrix $\bf{A}$.

Why I want to avoid working with $\bf{A^{-1}}$

I know that I can apply power iteration to $\bf{A^{-1}}$, but I wish to avoid working with this inverse matrix.

In my application, $\bf{A}$ is implicitly expressed as $\bf{A} = \bf{K'K}$ with $\bf{K} = \bf{L^{-1}R}$, where

  • $\bf{L}$ is an $m \times m$ lower triangular band matrix;
  • $\bf{R}$ is an $m \times n$ band matrix of full column rank $(m > n)$.

While $\bf{L}$ and $\bf{R}$ are sparse, $\bf{K}$ and $\bf{A}$ are fully dense. Given that both $m$ and $n$ can be large, I want to explicitly form neither $\bf{K}$ nor $\bf{A}$.

A bad luck is that $m \neq n$, so that $\bf{R}$ is not square and there is no convenient factor form as $\bf{A^{-1}} = R^{-1}LL'R^{-1\prime}$.

What I have tried

With $\bf{A}$ structured as above, it is computationally efficient to compute $\bf{Av}$ for any vector $\bf{v}$ using sparse linear algebra routines. It is easy to compute $\bf{A}$'s largest eigenvalue $\mu$ using power iteration:

  1. $\bf{v_0} = (\frac{1}{n}, \ldots, \frac{1}{n})$;
  2. $\bf{u_0} = \bf{Av_0}$;
  3. $\lambda_0 = \bf{v_0'u_0}$;
  4. for $k = 1, 2, \ldots$ till convergence of $\{\lambda_k\}$
    1. $\bf{v_k} = \bf{u_{k – 1}} / \|\bf{u_{k – 1}}\|$;
    2. $\bf{u_k} = \bf{Av_k}$;
    3. $\lambda_k = \bf{v_k'u_k}$;
  5. return $\lambda_k$.

To find the smallest eigenvalue, I apply this algorithm to $\bf{B} = \bf{A – \mu\bf{I}}$, inspired by this thread: https://math.stackexchange.com/a/271876/407465. Here is an R program to implement this method.

PowerIter <- function (A, mu = 0) {
  n <- nrow(A)
  ## spectral shift
  A <- A - diag(mu, n)
  ## power iteration
  v.old <- rep.int(1 / n, n)
  u.old <- A %*% v.old
  d.old <- sum(v.old, u.old)
  k <- 0L
  repeat {
    v.new <- u.old * (1 / sqrt(sum(u.old ^ 2)))
    u.new <- A %*% v.new
    d.new <- sum(v.new * u.new)
    if (abs(d.new - d.old) < abs(d.old) * 1e-8) break  ## test relative error
    d.old <- d.new
    u.old <- u.new
    k <- k + 1L
  }
  cat("convergence after", k, "iterations.\n")
  d.new
}

What goes wrong?

I found that this suggested algorithm (via spectral shift) does not seem numerically stable. I composed two toy examples, one being successful, the other being problematic.

A successful example

\begin{equation}
\bf{A} = \begin{pmatrix}
2 & -1 & 0\\
-1 & 2 & -1\\
0 & -1 & 2
\end{pmatrix}
\end{equation}

With $|\bf{A} – \lambda\bf{I}| = (2 – \lambda)[(2 – \lambda)^2 – 2]$, the three eigenvalues are $2 + \sqrt{2} \approx 3.4142136$, $2$ and $2 – \sqrt{2} \approx 0.5857864$.

The algorithm is a success for this matrix. Here is the output of an R session.

A <- matrix(c(2, -1, 0, -1, 2, -1, 0, -1, 2), nrow = 3)

d.max <- PowerIter(A)
## convergence after 7 iterations.
## [1] 3.414214

d.min <- PowerIter(A, d.max) + d.max
## convergence after 1 iterations.
## [1] 0.5857864

A problematic example

Now let's construct $\bf{A} = \bf{Q'DQ}$, where $\bf{D} = \textrm{diag}(d_1, d_2, d_3)$ are eigenvalues and $\bf{Q}$ is a rotation matrix

\begin{equation}
\bf{Q} = \begin{pmatrix}
0.36 & 0.48 & -0.8\\
-0.8 & 0.6 & 0\\
0.48 & 0.64 & 0.6
\end{pmatrix}
\end{equation}

I fixed $d_1 = 1$, $d_2 = 0.01$ and tried three choices for $d_3$: $10^{-3}$, $10^{-5}$ and $10^{-7}$. Here is the output of an R session.

test <- function (d) {
  Q <- matrix(c(0.36, -0.8, 0.48, 0.48, 0.6, 0.64, -0.8, 0, 0.6), nrow = 3)
  A <- crossprod(Q, d * Q)
  d.max <- PowerIter(A)
  d.min <- PowerIter(A, d.max) + d.max
  c(min = d.min, max = d.max)
}

test(c(1, 1e-2, 1e-3))
## convergence after 3 iterations.
## convergence after 298 iterations.
##         min         max 
## 0.001000543 1.000000000 

test(c(1, 1e-2, 1e-5))
## convergence after 3 iterations.
## convergence after 279 iterations.
##         min         max 
## 1.04883e-05 1.00000e+00 

test(c(1, 1e-2, 1e-7))
## convergence after 3 iterations.
## convergence after 279 iterations.
##          min          max 
## 5.860836e-07 1.000000e+00 

Computation of the smallest eigenvalue is slow and becomes increasingly inaccurate as $\bf{A}$ gets less well conditioned (but it is still far from being ill-conditioned!).

My questions

  1. Is there a mathematical justification for such observation?
  2. How can I modify this algorithm for better numerical stability?
  3. If the algorithm can not be improved, can someone suggest another efficient algorithm?
  4. If no better algorithm exists, any lower bound for the smallest eigenvalue? Gershgorin Circle Theorem is not good as it is too loose a bound.

Best Answer

We are given the matrix $A$ where $$A = B^T B$$ and $B = L^{-1} R$ and $R$ is a tall matrix of full column rank. There is little difficulty associated with solving linear systems of the form $$Ax = f.$$ If $B = US$ is a $QR$ factorization of $B$, then $S$ is nonsingular because $B$ has full rank. It follows that $$A = B^T B = S^T U^T U S = S^T S.$$ The problem of solving $Ax=f$ reduces to solving $$S^T y = f, \quad S x = y.$$ In short, we have the ability to do the inverse iteration, i.e., apply the power method to $A^{-1}$.

Is there a mathematical justification for such observation?

The convergence of the power method is controlled by the ratio of the largest to the second largest eigenvalue. Suppose that $A$ is symmetric with eigenvalues $\lambda_j$ that have been ordered such that $$ |\lambda_1| > |\lambda_2| \geq |\lambda_{n-1}| \geq |\lambda_n|.$$ Let $v_j$ denote an eigenvector of $A$ with respect to $\lambda_j$ and let $x = \sum_{j=1}^n r_j v_j$ denote any starting vector. Then $$ A^k x = \sum_{j=1}^n r_j \lambda_j^k v_j = \lambda_1^k \sum_{j=1}^n r_j \left(\frac{\lambda_j}{\lambda_1} \right)^k v_j = \lambda_1^k \left( r_1 v_1 + \sum_{j=2}^n r_j \left(\frac{\lambda_j}{\lambda_1} \right)^k v_j \right).$$ We see that if $r_1 \not = 0$ and if $|\lambda_1| \gg |\lambda_2|$, then $A^k x$ will converge rapidly towards an eigenvector of $A$ corresponding to the eigenvalue $\lambda_1$.

In the cited experiments the matrix $A$ is replaced with $A - \lambda_\max I$. In the "bad" case, $A$ is symmetric positive definite with eigenvalues $1$, $10^{-2}$ and $10^{-k}$ for $k \in \{3,5,7\}$. The power method functions well for the matrix $A$ because $$10^{-2} \ll 1.$$ The shift ruins everything because the shifted matrix $A-I$ has eigenvalues $0$, $-0.99$ and $-1+10^{-5}$. Convergence is slow because $0.99 \approx 1-10^{-k}$.

How can I modify this algorithm for better numerical stability?

The algorithm suffers from subtractive cancellation in the final operation that is supposed to correct for the initial shift. If we are limited to working precision, then there is nothing that can be done to recover from that.

If the algorithm can not be improved, can someone suggest another efficient algorithm?

If we want faster convergence, then there are at least three distinct choices

  1. The inverse iteration, i.e., the power method applied to $A^{-1}$.
  2. The inverse subspace iteration with Ritz acceleration which is a generalization of the power method. There is a good discussion of this algorithm in the standard textbook "Matrix Computation" by Gene Golub and Charles van Loan. They call it orthogonal iteration.
  3. The trace minimization algorithm (TraceMin) by Sameh and Wisniewski. This is a link to the original paper.

In all three cases, we do not have to explicitly form the matrix $A^{-1}$, but we need the ability to solve linear systems of the form $Ax=f$.

If no better algorithm exists, any lower bound for the smallest eigenvalue?

This is by far the hardest of the four questions. There is good convergence theory for the algorithms cited above and you can get estimates from this theory and your intermediate results. It is an open problem to determine a good lower bound for the smallest eigenvalue of a general symmetric positive definite matrix in using a reasonable amount of time, storage and energy. If it could be done, then the practical implications would be enormous.