[Math] Convergence of Conjugate Gradient Method for Positive Semi-Definite Matrix

linear algebranumerical linear algebranumerical methodssystems of equations

Let $A\in\mathbb{R}^{N\times N}$
be a positive semi-definite matrix, given $b\in\mbox{Col}\left(A\right)$
we want to solve the equation system $Ax=b$
. To add some notation, we define $\left<u,v\right>_{A}:=u^{\top}Av$
and given a starting guess $x^{0}$
we define $d^{0}=r^{0}=b-Ax^{0}$
. The iterative conjugate gradient method now follows these update equations:
$$\alpha^{n}:=\frac{\left(d^{n}\right)^{\top}r^{n}}{\left\Vert d^{n}\right\Vert _{A}}$$
$$x^{n+1}=x^{n}+\alpha^{n}d^{n}$$
$$r^{n+1}:=b-Ax^{n+1}$$
$$d^{n+1}=r^{n+1}-\frac{\left<d^{n},r^{n+1}\right>_{A}}{\left\Vert d^{n}\right\Vert _{A}}d^{n}
$$
Now, I have shown that given $A$
is positive semi-definite then $\left\Vert d^{n}\right\Vert _{A}=0$
iff $d^{n}=0$
and I have shown that $d^{n}\in\mbox{Col}\left(A\right)$
for all $n$
, combining these two facts shows that $\alpha^{n}$
is well defined and the method can be applied to positive semi-definite matrices. Now I want to show that given the error term $e^{n}:=\hat{x}-x^{n}$
(where $\hat{x}$
is a solution for $Ax=b$
) and given that $e^{n}\notin\ker\left(A\right)$ we have $\left\Vert e^{n}\right\Vert _{A}\to0$ when $n\to\infty$. It would also suffice to show that $\frac{\left\Vert e^{n+1}\right\Vert _{A}}{\left\Vert e^{n}\right\Vert _{A}}<1$. I was advised to look at representation of the remainder, error and current solution in an eigenbasis of $A$
but I haven't managed to get anywhere, help would be most appreciated.

Proof of monotonic error decrease:
First observe that $$e_{n+1} = x_{n+1} – \hat{x} = x_{n} + \alpha_{n}d_{n} – \hat{x} = e_{n} + \alpha_{n} d_{n}$$
Thus:
$$\left\Vert e_{n+1}\right\Vert _{A}^{2}=\left(e_{n}+\alpha_{n}d_{n}\right)^{\top}A\left(e_{n}+\alpha_{n}d_{n}\right)$$
$$=e_{n}^{\top}Ae_{n}+\alpha_{n}e_{n}^{\top}Ad_{n}+\alpha_{n}\overbrace{d_{n}^{\top}Ae_{n}}^{=e_{n}^{\top}Ad_{n}}+\alpha_{n}^{2}d_{n}^{\top}Ad_{n}$$
$$=\left\Vert e_{n}\right\Vert _{A}^{2}+2\left(\frac{r_{n}^{\top}d_{n}}{d_{n}^{\top}Ad_{n}}\right)e_{n}^{\top}Ad_{n}+\left(\frac{r_{n}^{\top}d_{n}}{d_{n}^{\top}Ad_{n}}\right)^{2}d_{n}^{\top}Ad_{n}$$
$$=\left\Vert e_{n}\right\Vert _{A}^{2}+2\left(\frac{r_{n}^{\top}d_{n}}{d_{n}^{\top}Ad_{n}}\right)e_{n}^{\top}Ad_{n}+\frac{\left(r_{n}^{\top}d_{n}\right)^{2}}{d_{n}^{\top}Ad_{n}}$$
$$=\left\Vert e_{n}\right\Vert _{A}^{2}+2\left(\frac{\left(-Ae_{n}\right)^{\top}d_{n}}{d_{n}^{\top}Ad_{n}}\right)e_{n}^{\top}Ad_{n}+\frac{\left(\left(-Ae_{n}\right)^{\top}d_{n}\right)^{2}}{d_{n}^{\top}Ad_{n}}$$
$$=\left\Vert e_{n}\right\Vert _{A}^{2}-2\left(\frac{e_{n}^{\top}Ad_{n}}{d_{n}^{\top}Ad_{n}}\right)\overbrace{d_{n}^{\top}Ae_{n}}^{=e_{n}^{\top}Ad_{n}}+\frac{\left(e_{n}^{\top}Ad_{n}\right)^{2}}{d_{n}^{\top}Ad_{n}}$$
$$=\left\Vert e_{n}\right\Vert _{A}^{2}-2\frac{\left(e_{n}^{\top}Ad_{n}\right)^{2}}{d_{n}^{\top}Ad_{n}}+\frac{\left(e_{n}^{\top}Ad_{n}\right)^{2}}{d_{n}^{\top}Ad_{n}}=\left\Vert e_{n}\right\Vert _{A}^{2}-\frac{\left(e_{n}^{\top}Ad_{n}\right)^{2}}{\left\Vert d_{n}\right\Vert _{A}^{2}}<\left\Vert e_{n}\right\Vert _{A}^{2}$$
I used the easily shown equality $r_{n}=-Ae_{n}$ and he last inequality is true as long as $e_{n}\neq0$.

Best Answer

I was kind of curious about an answer working on an eigenbasis of $A$ because I only can think of treating the cg-method as Krylow-method and I can't see the direct connection there. So, if you end up with such a solution I'd be happy if you could share it here.

In return, I can at least offer the version that I already know. It basically is the approach for strictly positive definite matrices but under the given conditions it seems to work for semi-definite ones as well (which actually surprises me a little but I just can't see why it doesn't work) $-$ please correct me if I'm wrong.

As you referred to a post using symmetric positive semi-definite matrices in the comments, I assumed that the matrix given here is symmetric, too. Although this possibly isn't necessary either (but then you'd definitely have to pay more attention to how you actually use the $A$-scalar product).

Also you already pointed out that the method is well-defined as of $b\in \mbox{Col}(A)$ and that $r^n, d^n \in \mbox{Col}(A)$ for all $n\in\mathbb{N}$.

Small Corrections

In your post you defined the $A$-scalar product $\langle u, v\rangle_A = u^\top A v$. From that, you'd usually define $\Vert u \Vert_A = \sqrt{\langle u, u\rangle_A}$. Using this definition, the update rules change to $$ \alpha^n = \frac{\langle d^n, r^n\rangle_I}{\Vert d^n \Vert^2_A} $$ and $$ d^{n+1} = r^{n+1} - \frac{\langle d^n, r^{n+1}\rangle_A}{\Vert d^n\Vert_A^2} d^n\text{.} $$

$A$-orthogonality of search directions

If you already know that the search directions will be $A$-orthogonal you can skip this part. I decided to elaborate it a little because it is the important part (for Krylow-methods in general).

We define $$ K_n(A,r^0) = \mbox{span}\{ A^k r^0 \mid k \in \{0,\dots,n-1\}\} $$ for all $n\in\mathbb{N}$. We claim:

For all $n\in\mathbb{N}$ we have $$ K_n(A,r^0) = \mbox{span}\{d^j \mid j \in \{0,\dots,n-1\} \} = \mbox{span}\{r^j \mid \{0,\dots,n-1\} \} $$ and pairwise $A$-orthogonality of the directions $d^j$ for $j\in\{0,\dots,n\}$ as long as $r_j \neq 0$ for all $j\in\{0,\dots,n\}$.

The proof is done via induction where the induction statement is clearly fulfilled for $n=0$. So we assume the statement holds for some $n\in \mathbb{N}$ and try to show it for $n+1$. First of all, by definition of $r^n$ (using the definition of $x^n$) we have for all $j\in\{0,\dots,n\}$ $$ r^n = r^j - \sum_{i=j}^{n-1} \alpha^i Ad^i. $$ Consequently, this yields for all $j\in\{0,\dots,n-1\}$ $$ \begin{array}{rl} \langle d^j, r^{n} \rangle_I & = \left\langle d^j, r^j - \textstyle\sum_{i=j}^{n-1} \alpha^j Ad^i \right\rangle_I \\ & = \langle d^j, r^j - \alpha^j A d^j \rangle_I \\ & = \left \langle d^j, r^j - \frac{\langle d^j, r^j\rangle_A}{\langle d^j, d^j\rangle_A} Ad^j \right \rangle_I \\ & = 0 \end{array} $$ where we used $\langle d^j, d^i\rangle_A = 0$ for $i,j \in \{0,\dots, n-1\}$ and $i\neq j$ by induction assumption. In particular, we have $r^{n} \perp K_{n}(A,r^0)$ as of $K_{n}(A,r^0) = \mbox{span}\{d_0,\dots,d_{n-1}\}$ (also by induction assumption).

In addition, we know $K_{n-1}(A,r^0) = \mbox{span}\{r^0,\dots,A^{n-2}r^0\}$ and thus $$ AK_{n-1}(A,r^0) = \mbox{span}\{Ar^0,\dots,A^{n-1}r^0\} \subseteq \mbox{span}\{r^0,\dots,A^{n-1}r^0\} = K_n(A,r^0)\text{,} $$ which implies $Ad^j \in K_n(A,r^0)$ for all $j\in \{0,\dots,n-2\}$. Together with $r^{n} \perp K_{n}(A,r^0)$ this leads to $$ \langle r^n, d^j \rangle_A = (r^n)^\top Ad^j = 0 $$ for all $j\in\{0,\dots,n-2\}$. In particular holds for all $j\in \{0,\dots,n-2\}$ by definition of $d^n$ and pairwise $A$-orthogonality of $\{d^i\}_{i\in\{0,\dots,n-1\}}$ by induction assumption using the above equality $$ \langle d^n, d^j\rangle_A = \left\langle r^n - \frac{\langle d^{n-1}, r^n\rangle_A}{\Vert d^{n-1}\Vert_A^2} d^{n-1}, d^j\right\rangle_A = 0\text{.} $$ Furthermore, we of course have $\langle d^n, d^{n-1}\rangle_A=0$, too, which is obvious by using the definition of $d^n$ as above. In total, we have that $\{d^i\}_{i\in\{0,\dots,n\}}$ is pairwise $A$-orthogonal.

By definition of $r^n$ and $d^n$ follows that $\mbox{span}\{d^j \mid j \in \{0,\dots,n\} \} = \mbox{span}\{r^j \mid \{0,\dots,n\} \}$ (is this obvious enough?). Also, the definition of $r^n$ shows $r^n \in K_{n+1}(A,r^0)$.

The pairwise $A$-orthogonality and $\{d^i\}_{i\in\{0,\dots,n\}} \subseteq \mbox{Col}(A)$ imply linear independence of $\{d^i\}_{i\in\{0,\dots,n\}}$ (This step is important! If we couldn't assure $d^i \in \mbox{Col}(A)$ this wouldn't hold anymore). For $d^n \neq 0$ (which follows from $r^n \neq 0$) we consequently have $K_n(A,r_0) \subsetneq \mbox{span}\{d^0,\dots,d^n\}$ and thus we have $K_{n+1}(A,r^0) = \mbox{span}\{d^0,\dots,d^n\}$ because $d_n \in K_{n+1}(A,r^0)$ holds (which follows from the definition of $d^n = r^n - \beta^n d^{n-1}$ and $r^n \in K_{n+1}(A,r^0)$ and $d^{n-1} \in K_n(A,r^0)$). All in all, this leads us to $$ K_n(A,r^0) = \mbox{span}\{d^j \mid j \in \{0,\dots,n-1\} \} = \mbox{span}\{r^j \mid \{0,\dots,n-1\} \} $$ so that the claim is proved.

Convergence

This part basically uses the same arguments as in the proof above. So just for the case that part was skipped or is partly wrong...

Let $n\in\mathbb{N}$ and $\hat{x}$ fulfill $A\hat{x} = b$. Then we have $A(\hat{x}-x^n) = b-Ax^n = r^n$. Using $A$-orthogonality of the $d^j$ we have for all $j\in \{0,\dots,n-1\}$ (just as in the proof above) $$ \langle r^n, d^j \rangle_I = \langle r^j - \textstyle \sum_{i=j}^{n-1} \alpha^i Ad^i, d^j \rangle_I = \langle r^j - a^j Ad^j, d^j \rangle_I = \langle r^j - \frac{\langle r^j, d^j\rangle_A}{\langle d^j, d^j\rangle_A} A d^j, d^j \rangle_I = 0. $$ Because the $d^j$ are $A$-orthogonal, non-zero (else we already converged) and a subset of $\mbox{Col}(A)$ we have that the $d^j$ are linearly independent. This implies that for $n = \mbox{dim}(\mbox{Col}(A))$ holds $\mbox{span}\{d_0,\dots,d_{n-1}\} = \mbox{Col}(A)$. From the above property we then have $r^n \perp \mbox{span}\{d_0,\dots,d_{n-1}\} = \mbox{Col}(A)$ but by construction we also have $r^n \in \mbox{Col}(A)$. This implies $r^n = 0$. So we have $0 = r^n = b - Ax^n$ and thus $$ \Vert \hat{x}-x^n \Vert_A^2 = \langle \hat{x} - x, \hat{x}-x \rangle_A = (\hat{x}-x)^\top A(\hat{x}-x) = (\hat{x}-x)^\top r^n = 0\text{.} $$ In particular, we have $\Vert \hat{x}-x^n \Vert_A^2 \to 0 $ for $n\to \infty$.

Related Question