Power of a Matrix, Largest Eigenvalue, and Convergence Acceleration

eigenvalueslinear algebramatricesreal-analysis

I want $S^k$, with $S=I-\Lambda^{-1}M$, to tend to zero quite fast as $k\rightarrow \infty$, as this is what drives the convergence in a fixed-point algorithm. Here $M=X^TX$ is a fixed $m\times m$ matrix, $I$ is the $m\times m$ identity matrix, $\Lambda$ is an $m\times m$ diagonal matrix, and $X$ is an $n \times m$ matrix so that $M=X^TX$ is $m\times m$, symmetric and positive semidefinite. I am trying to find a good $\Lambda$ that achieves this goal, yet one that is simple easy to compute.

I know the convergence speed is driven by the largest eigenvalue of $S$. That eigenvalue must be $<1$ in absolute value. Let $\lambda_i$ be the $i$-th element of $\Lambda$. If $\Lambda$ is chosen so that the resulting elements of $S$ are in some sense, close enough to zero – as "close" as they can be – then one would expect fast convergence, and it does work in practice. For $\lambda_i$, I chose the diagonal element of $M$ in the $i$-row, divided by the sum of the squares of the elements of $M$ in the $i$-th row. I am moderately happy with the results (at least for matrices up to $m=6$) but I am wondering if it is possible to get better $\lambda_i$'s, that on average will further boost convergence to zero. Also, I want to keep the $\lambda_i$'s as simple as possible. All elements in all the matrices are real numbers.

My question: Does my choice of $\Lambda$ always lead to $S^k\rightarrow 0$ ($k\rightarrow\infty$), and is there a better choice (yet as simple as possible) that will make convergence faster?

Best Answer

  1. As Iosif Pinelis pointed out, you presumably meant $$ \lambda_i^{-1} = \frac{e_i^T M e_i}{\lVert M e_i \rVert_2^2}, $$ where $e_i$ is the $i$th standard basis vector, which is indeed a reasonable choice, and can be motivated as being the unique minimizer of the Frobenius norm $$ \lVert I - \Lambda^{-1} M \rVert_F $$ under the constraint that $\Lambda$ be diagonal. You can find some literature on preconditioners found by minimizing this sort of Frobenius norm subject to some sparsity constraint under the keywords "sparse approximate inverse" (SAI). The specific choice above, where the preconditioner is constrained to be diagonal, was considered in [1] under the name "SPAI-0". In the context of [1], this preconditioner was recommended as a component (the smoother) of a more complicated preconditioner (multigrid), precisely because for their application the spectral radius of $S$ was expected to still be close to 1, implying slow convergence.

  2. In more generality, it is a standard result in iterative methods for solving linear systems that the optimal choice of damping parameter $\omega$ minimizing the spectral radius of $$ I - \omega B A $$ for symmetric positive definite $A$ and $B$ is given by $$ \omega = \frac{2}{\lambda_{\text{min}} + \lambda_{\text{max}}}, $$ where $\lambda_{\text{min}}$ and $\lambda_{\text{max}}$ are the smallest and largest eigenvalues of $BA$, which are necessarily all positive real. This choice gives the spectral radius $$ \rho(I - \omega B A) = \frac{\kappa - 1}{\kappa + 1} < 1, \quad\text{where } \kappa = \frac{\lambda_{\text{max}}}{\lambda_{\text{min}}} . $$ Relating this back to your problem, any choice of SPD $\Lambda$, not just diagonal, can be made to converge with a suitable scaling, provided $M$ is nonsingular. This generalizes part 4 of Iosif's answer, which corresponds to the choice $B=I$ above. Note this argument reduces the problem to finding an optimal preconditioner $B$ ($\Lambda^{-1}$ in your case), in the sense of minimizing the condition number $\kappa$.

  3. It is easy to analyze the $2 \times 2$ case in full detail. Consider the matrix $$ M = \begin{bmatrix} 1 & \tfrac{\kappa-1}{\kappa+1} \\ \tfrac{\kappa-1}{\kappa+1} & 1 \end{bmatrix}, $$ which has eigenvalues $2/(\kappa+1)$ and $2\kappa/(\kappa+1)$, and condition number $\kappa$. One can show that the spectral radius of $S$ is minimized in this case by $\Lambda = I$ (assuming diagonal $\Lambda$, of course), and the minimum spectral radius is, again, $$ \rho(S) = \frac{\kappa - 1}{\kappa + 1} . $$ This shows, as Iosif pointed out, that the convergence can be arbitrarily slow for poorly conditioned $M$, even for the optimal choice of diagonal preconditioner, already in the $2 \times 2$ case. This also shows that the nonconvergence that occurs when $M$ is singular can be approached "continuously" in the limit $\kappa \to \infty$, in the sense that $\rho(S) \to 1$ continuously as $\kappa \to \infty$. Note that your choice of $\Lambda$, the one minimizing $\lVert I - \Lambda^{-1} M \rVert_F$, is suboptimal here, though it still gives $\rho(S) < 1$. In the worst case, it gives a $\rho(S)$ about 21% higher than optimal for this particular $M$.

[1] Bröker, Oliver; Grote, Marcus J., Sparse approximate inverse smoothers for geometric and algebraic multigrid, Appl. Numer. Math. 41, No. 1, 61-80 (2002). ZBL0995.65129.

Related Question