Look up "inverse power iteration", it finds the eigenvalue with smallest absolute value.
The sequence $v,A^{-1}v,A^{-2}v,…$ converges linearly towards the eigenvector of the smallest eigenvalue with a speed that is the ration of the two smallest eigenvalues. $v$ should be random enough to not be orthogonal to the searched for eigenspace.
We are given the matrix $A$ where
$$A = B^T B$$
and $B = L^{-1} R$ and $R$ is a tall matrix of full column rank. There is little difficulty associated with solving linear systems of the form
$$Ax = f.$$
If $B = US$ is a $QR$ factorization of $B$, then $S$ is nonsingular because $B$ has full rank. It follows that $$A = B^T B = S^T U^T U S = S^T S.$$ The problem of solving $Ax=f$ reduces to solving $$S^T y = f, \quad S x = y.$$
In short, we have the ability to do the inverse iteration, i.e., apply the power method to $A^{-1}$.
Is there a mathematical justification for such observation?
The convergence of the power method is controlled by the ratio of the largest to the second largest eigenvalue. Suppose that $A$ is symmetric with eigenvalues $\lambda_j$ that have been ordered such that
$$ |\lambda_1| > |\lambda_2| \geq |\lambda_{n-1}| \geq |\lambda_n|.$$
Let $v_j$ denote an eigenvector of $A$ with respect to $\lambda_j$ and let $x = \sum_{j=1}^n r_j v_j$ denote any starting vector. Then
$$ A^k x = \sum_{j=1}^n r_j \lambda_j^k v_j = \lambda_1^k \sum_{j=1}^n r_j \left(\frac{\lambda_j}{\lambda_1} \right)^k v_j = \lambda_1^k \left( r_1 v_1 + \sum_{j=2}^n r_j \left(\frac{\lambda_j}{\lambda_1} \right)^k v_j \right).$$
We see that if $r_1 \not = 0$ and if $|\lambda_1| \gg |\lambda_2|$, then $A^k x$ will converge rapidly towards an eigenvector of $A$ corresponding to the eigenvalue $\lambda_1$.
In the cited experiments the matrix $A$ is replaced with $A - \lambda_\max I$. In the "bad" case, $A$ is symmetric positive definite with eigenvalues $1$, $10^{-2}$ and $10^{-k}$ for $k \in \{3,5,7\}$. The power method functions well for the matrix $A$ because $$10^{-2} \ll 1.$$ The shift ruins everything because the shifted matrix $A-I$ has eigenvalues $0$, $-0.99$ and $-1+10^{-5}$. Convergence is slow because $0.99 \approx 1-10^{-k}$.
How can I modify this algorithm for better numerical stability?
The algorithm suffers from subtractive cancellation in the final operation that is supposed to correct for the initial shift. If we are limited to working precision, then there is nothing that can be done to recover from that.
If the algorithm can not be improved, can someone suggest another efficient algorithm?
If we want faster convergence, then there are at least three distinct choices
- The inverse iteration, i.e., the power method applied to $A^{-1}$.
- The inverse subspace iteration with Ritz acceleration which is a generalization of the power method. There is a good discussion of this algorithm in the standard textbook "Matrix Computation" by Gene Golub and Charles van Loan. They call it orthogonal iteration.
- The trace minimization algorithm (TraceMin) by Sameh and Wisniewski. This is a link to the original paper.
In all three cases, we do not have to explicitly form the matrix $A^{-1}$, but we need the ability to solve linear systems of the form $Ax=f$.
If no better algorithm exists, any lower bound for the smallest eigenvalue?
This is by far the hardest of the four questions. There is good convergence theory for the algorithms cited above and you can get estimates from this theory and your intermediate results. It is an open problem to determine a good lower bound for the smallest eigenvalue of a general symmetric positive definite matrix in using a reasonable amount of time, storage and energy. If it could be done, then the practical implications would be enormous.
Best Answer
I assume that $M(\lambda)$ is not singular unless for some $\lambda = \lambda_i$ where $\det M(\lambda_i) = 0$ and $\dim \ker M(\lambda_i) = 1$. I also assume that all roots of $\det M(\lambda) = 0$ are simple.
Solving $\det M(\lambda) = 0$ directly is hard, since $\det M(\lambda)$ would a polynomial of $4n$ degree where $n$ is the number of rows in $M$.
The following numerical approach may be used to find the $\lambda$ and the kernel simultaneously:
Note that compute the kernel is rather simple if you already have the decomposition.
$$ M(\lambda_i) = LDL^\top\\ L = \begin{pmatrix} 1 \\ l_{2,1} & 1\\ \vdots & \ddots & 1\\ l_{r,1} & \cdots & l_{r,r-1} & 1\\ \ast & & \cdots & & \ast\\ \vdots & & & & & \ddots\\ \ast & & & \cdots & & & \ast \end{pmatrix}\\ D = \begin{pmatrix} d_1 \\ & d_2\\ && \ddots\\ &&& d_r \\ &&&& \ast \\ &&&&& \ddots\\ &&&&&& \ast \end{pmatrix} $$ Here is assumed that decomposition breaks at row $r$ with $|d_r| < \epsilon$. Not computed values marked as $\ast$.
Solving $$ LDL^\top x = 0 $$ reduces to $$ \begin{pmatrix} 1 & l_{2,1} & \cdots & l_{r,1} & \ast & \cdots & \ast\\ & 1 & \ddots & \vdots & & & \\ && 1 & l_{r,r-1} & \vdots & & \\ &&& 1 & & & \vdots\\ &&&& \ast\\ &&&&& \ddots\\ &&&&&& \ast\\ \end{pmatrix} \begin{pmatrix} x_1\\ \vdots\\ x_r\\ x_{r+1}\\ \vdots\\ x_n \end{pmatrix} = \begin{pmatrix} 0\\ \vdots\\ 1\\ 0\\ \vdots\\ 0 \end{pmatrix} = \mathbf e_r $$ Plugging $x_r = 1, x_{r+1} = \cdots = x_n = 0$ and solving the system via back substitution gives the desired kernel vector. Note that for tridiagonal case the $L$ matrix would be bidiagonal and the factorization and solution steps have $O(n)$ complexity.
Here is a Jupyter notebook with a proof-of-concept implementation.