Conjugate gradient method, why is the coefficient matrix not stored

convex optimizationnumerical methodsoptimization

I've been learning about the conjugate gradient method but I have a hard time understanding the advantages it has in terms of memory usage. If we use the method to solve the linear system $Ax = b$ (or to find the extremum of the quadric defined by $A$ and $b$), apparently $A$ does not have to be stored. But when I look at the Hestenes-Stiefel algorithm I see formulas like:

$\lambda_{i} = r_{i – 1}^{T}r_{i – 1}/p_{i}^{T}Ap_{i}$ and

$r_{i} = r_{i – 1} – \lambda_{i}Ap_{i}$

which clearly contain $A$. So how does this method avoid storing $A$? Is there some sort of approximation of decomposition of $A$ going on? I can't find a concise answer to this question but would like to understand.

Thank you very much in advance for your explanations,

Joshua

Best Answer

Sometimes when solving this sort of problem numerically, we are given a black-box subfunction for the operator $A$. It takes $x$ as an input, and gives us the matrix-vector product $Ax$. Sometimes this type of function is called a "Mat-Vec" for short. We know that the underlying process is linear, but we don't necessarily know the entries of $A$.

Sure, if we really needed to, we could feed basis vectors to the mat-vec function to construct $A$ in closed-form. However, this requires extra memory/computation, since you have to construct $A$ before even running your algorithm. Also, depending on the application, computing the mat-vec might actually be faster than performing matrix-vector multiplication (e.g. discrete Fourier transform).

This is one of the huge advantages of the methods you've listed -- they only require the ability to compute a matrix-vector product $Ax$ (and in particular they do not require the ability to access each entry of $A$).

Related Question