Solving Linear Equations Using Iteration

linear algebramatrix equationsnumerical linear algebra

I was studying policy evaluation in Markov Decision Processes, and came across a way of solving linear equations. Basically given a set of linear equations in n variables, one makes random guesses for the n variables. Then in the next iteration, one updates those guesses for each variable by using the values of the previous iteration. Illustrating the method , consider the following linear equations:
$$x = 1+0.5y$$
$$y = x+3$$
Guess $x = 3,y = 2$. In the next iteration, $x$ is set to $2$ (Evaluate the RHS of the first equation) and similarly $y$ is set to $6$. This sequence of steps will converge to the solution $(5,8)$ (You can verify this).

Such a method does work, given I solved a similar problem using iteration where a sequence was given with $a_0 = 1$ and $a_{12} = 12$, with the condition that each element is one more than the average of the adjacent elements. I had to find the middle element, for which I solved the system of linear equations using iteration.

Can someone give some rigorous arguments regarding the convergence conditions, and proofs for such a method? I tried a proof using some matrix algebra:

Given a matrix equation $Ax = B$, one can formulate this to be minimising the function $(B-Ax)^T(B-Ax)$. So one can do gradient descent on $x$, where the gradient with respect to $x$ is $A^T(Ax-B)$. So the update rule becomes $$ x_{n} = x_{n-1} + A^T(B-Ax_{n-1}) $$

Such a descent will converge to a local minima, which given the convex optimisation function, will be the global minima (not necessarily unique).

Now if use the iteration method, one can define the problem the same way – minimising the above convex function. However the update rule this time would be $$ x_{n} = x_{n-1} + D^{-1}(B-Ax_{n-1}) $$ where $D$ is the square matrix with all non diagonal coefficents 0 and diagonal coefficients of $A$

This is because for each equation one takes all the other variables to the RHS, so the system becomes $Dx_{n} = B – NDx_{n-1}$, where $ND$ is the non diagonal coefficient matrix ($D + ND = A$). Putting $ND = A -D$ , one gets the above update rule.

This is not necessarily in the direction of the gradient. How should I proceed further?

Best Answer

It looks like you've come up with a specific variation of iterative refinement. Iterative refinement allows you to improve a prospective solution to a linear system of equations by using an algorithm that solves linear systems approximately.

If your equation is

$$ Ax=b, $$

and you have some initial guess $x_0$, then with iterative refinement you do the following:

$$ x_1=x_0+f(A,b-Ax_0) $$

where $f(A,v)$ is some method for approximating the operation $A^{-1}v$. If $f$ performs that operation exactly then you solve your linear system of equations in one step.

In your case you're doing $f(A,v)=D^{-1}v$, and so you're basically implicitly assuming that $A\approx D$.

Iterative refinement is the procedure that is used for "restarting" Krylov subspace-based linear system solving algorithms such GMRES. These methods are based on approximating multiplication by $A^{-1}$ with a matrix polynomial of $A$. This is true also of conjugate gradients, actually; you just happen to be able to derive conjugate gradients by using a gradients-based method because CG assumes that $A$ is symmetric positive definite and can therefore be used to define an inner product.

I don't know a lot of the details about the convergence properties of iterative refinement; I think it depends a lot on your matrix and the method you use for approximating its inverse multiplication. The wikipedia page might point you in the right direction to start out with:

https://en.wikipedia.org/wiki/Iterative_refinement