I want to numerically compute the smallest eigenvalue of an $n \times n$ positive definite matrix $\bf{A}$.
Why I want to avoid working with $\bf{A^{-1}}$
I know that I can apply power iteration to $\bf{A^{-1}}$, but I wish to avoid working with this inverse matrix.
In my application, $\bf{A}$ is implicitly expressed as $\bf{A} = \bf{K'K}$ with $\bf{K} = \bf{L^{-1}R}$, where
- $\bf{L}$ is an $m \times m$ lower triangular band matrix;
- $\bf{R}$ is an $m \times n$ band matrix of full column rank $(m > n)$.
While $\bf{L}$ and $\bf{R}$ are sparse, $\bf{K}$ and $\bf{A}$ are fully dense. Given that both $m$ and $n$ can be large, I want to explicitly form neither $\bf{K}$ nor $\bf{A}$.
A bad luck is that $m \neq n$, so that $\bf{R}$ is not square and there is no convenient factor form as $\bf{A^{-1}} = R^{-1}LL'R^{-1\prime}$.
What I have tried
With $\bf{A}$ structured as above, it is computationally efficient to compute $\bf{Av}$ for any vector $\bf{v}$ using sparse linear algebra routines. It is easy to compute $\bf{A}$'s largest eigenvalue $\mu$ using power iteration:
- $\bf{v_0} = (\frac{1}{n}, \ldots, \frac{1}{n})$;
- $\bf{u_0} = \bf{Av_0}$;
- $\lambda_0 = \bf{v_0'u_0}$;
- for $k = 1, 2, \ldots$ till convergence of $\{\lambda_k\}$
- $\bf{v_k} = \bf{u_{k – 1}} / \|\bf{u_{k – 1}}\|$;
- $\bf{u_k} = \bf{Av_k}$;
- $\lambda_k = \bf{v_k'u_k}$;
- return $\lambda_k$.
To find the smallest eigenvalue, I apply this algorithm to $\bf{B} = \bf{A – \mu\bf{I}}$, inspired by this thread: https://math.stackexchange.com/a/271876/407465. Here is an R program to implement this method.
PowerIter <- function (A, mu = 0) {
n <- nrow(A)
## spectral shift
A <- A - diag(mu, n)
## power iteration
v.old <- rep.int(1 / n, n)
u.old <- A %*% v.old
d.old <- sum(v.old, u.old)
k <- 0L
repeat {
v.new <- u.old * (1 / sqrt(sum(u.old ^ 2)))
u.new <- A %*% v.new
d.new <- sum(v.new * u.new)
if (abs(d.new - d.old) < abs(d.old) * 1e-8) break ## test relative error
d.old <- d.new
u.old <- u.new
k <- k + 1L
}
cat("convergence after", k, "iterations.\n")
d.new
}
What goes wrong?
I found that this suggested algorithm (via spectral shift) does not seem numerically stable. I composed two toy examples, one being successful, the other being problematic.
A successful example
\begin{equation}
\bf{A} = \begin{pmatrix}
2 & -1 & 0\\
-1 & 2 & -1\\
0 & -1 & 2
\end{pmatrix}
\end{equation}
With $|\bf{A} – \lambda\bf{I}| = (2 – \lambda)[(2 – \lambda)^2 – 2]$, the three eigenvalues are $2 + \sqrt{2} \approx 3.4142136$, $2$ and $2 – \sqrt{2} \approx 0.5857864$.
The algorithm is a success for this matrix. Here is the output of an R session.
A <- matrix(c(2, -1, 0, -1, 2, -1, 0, -1, 2), nrow = 3)
d.max <- PowerIter(A)
## convergence after 7 iterations.
## [1] 3.414214
d.min <- PowerIter(A, d.max) + d.max
## convergence after 1 iterations.
## [1] 0.5857864
A problematic example
Now let's construct $\bf{A} = \bf{Q'DQ}$, where $\bf{D} = \textrm{diag}(d_1, d_2, d_3)$ are eigenvalues and $\bf{Q}$ is a rotation matrix
\begin{equation}
\bf{Q} = \begin{pmatrix}
0.36 & 0.48 & -0.8\\
-0.8 & 0.6 & 0\\
0.48 & 0.64 & 0.6
\end{pmatrix}
\end{equation}
I fixed $d_1 = 1$, $d_2 = 0.01$ and tried three choices for $d_3$: $10^{-3}$, $10^{-5}$ and $10^{-7}$. Here is the output of an R session.
test <- function (d) {
Q <- matrix(c(0.36, -0.8, 0.48, 0.48, 0.6, 0.64, -0.8, 0, 0.6), nrow = 3)
A <- crossprod(Q, d * Q)
d.max <- PowerIter(A)
d.min <- PowerIter(A, d.max) + d.max
c(min = d.min, max = d.max)
}
test(c(1, 1e-2, 1e-3))
## convergence after 3 iterations.
## convergence after 298 iterations.
## min max
## 0.001000543 1.000000000
test(c(1, 1e-2, 1e-5))
## convergence after 3 iterations.
## convergence after 279 iterations.
## min max
## 1.04883e-05 1.00000e+00
test(c(1, 1e-2, 1e-7))
## convergence after 3 iterations.
## convergence after 279 iterations.
## min max
## 5.860836e-07 1.000000e+00
Computation of the smallest eigenvalue is slow and becomes increasingly inaccurate as $\bf{A}$ gets less well conditioned (but it is still far from being ill-conditioned!).
My questions
- Is there a mathematical justification for such observation?
- How can I modify this algorithm for better numerical stability?
- If the algorithm can not be improved, can someone suggest another efficient algorithm?
- If no better algorithm exists, any lower bound for the smallest eigenvalue? Gershgorin Circle Theorem is not good as it is too loose a bound.
Best Answer
We are given the matrix $A$ where $$A = B^T B$$ and $B = L^{-1} R$ and $R$ is a tall matrix of full column rank. There is little difficulty associated with solving linear systems of the form $$Ax = f.$$ If $B = US$ is a $QR$ factorization of $B$, then $S$ is nonsingular because $B$ has full rank. It follows that $$A = B^T B = S^T U^T U S = S^T S.$$ The problem of solving $Ax=f$ reduces to solving $$S^T y = f, \quad S x = y.$$ In short, we have the ability to do the inverse iteration, i.e., apply the power method to $A^{-1}$.
The convergence of the power method is controlled by the ratio of the largest to the second largest eigenvalue. Suppose that $A$ is symmetric with eigenvalues $\lambda_j$ that have been ordered such that $$ |\lambda_1| > |\lambda_2| \geq |\lambda_{n-1}| \geq |\lambda_n|.$$ Let $v_j$ denote an eigenvector of $A$ with respect to $\lambda_j$ and let $x = \sum_{j=1}^n r_j v_j$ denote any starting vector. Then $$ A^k x = \sum_{j=1}^n r_j \lambda_j^k v_j = \lambda_1^k \sum_{j=1}^n r_j \left(\frac{\lambda_j}{\lambda_1} \right)^k v_j = \lambda_1^k \left( r_1 v_1 + \sum_{j=2}^n r_j \left(\frac{\lambda_j}{\lambda_1} \right)^k v_j \right).$$ We see that if $r_1 \not = 0$ and if $|\lambda_1| \gg |\lambda_2|$, then $A^k x$ will converge rapidly towards an eigenvector of $A$ corresponding to the eigenvalue $\lambda_1$.
In the cited experiments the matrix $A$ is replaced with $A - \lambda_\max I$. In the "bad" case, $A$ is symmetric positive definite with eigenvalues $1$, $10^{-2}$ and $10^{-k}$ for $k \in \{3,5,7\}$. The power method functions well for the matrix $A$ because $$10^{-2} \ll 1.$$ The shift ruins everything because the shifted matrix $A-I$ has eigenvalues $0$, $-0.99$ and $-1+10^{-5}$. Convergence is slow because $0.99 \approx 1-10^{-k}$.
The algorithm suffers from subtractive cancellation in the final operation that is supposed to correct for the initial shift. If we are limited to working precision, then there is nothing that can be done to recover from that.
If we want faster convergence, then there are at least three distinct choices
In all three cases, we do not have to explicitly form the matrix $A^{-1}$, but we need the ability to solve linear systems of the form $Ax=f$.
This is by far the hardest of the four questions. There is good convergence theory for the algorithms cited above and you can get estimates from this theory and your intermediate results. It is an open problem to determine a good lower bound for the smallest eigenvalue of a general symmetric positive definite matrix in using a reasonable amount of time, storage and energy. If it could be done, then the practical implications would be enormous.