[Math] The benefit of LU decomposition over explicitly computing the inverse

linear algebramatricesmatrix decompositionnumerical linear algebranumerical methods

I'm going to teach a linear algebra course in the fall, and I want to motivate the topic of matrix factorizations such as the LU decomposition. A natural question one can ask is, why care about this when one already knows how to compute $A^{-1}$ and can use it to solve $Ax=b$ just by matrix-vector multiplication? The standard answer is that is that in practice one should (almost) never actually invert a matrix, because you will incur less roundoff error by backsubstituting a factorization like $LUx = b$ than by performing the multiplication $x=A^{-1}b$.

However, I recently came across the paper "How accurate is inv(A)*b?" by Druinsky and Toledo (2012) where they argue that the received wisdom is misleading, and a solution $x_{\text{inv}}=Vx$ obtained using an approximate inverse $V\approx A^{-1}$ is typically just as close to the true $x$ as a solution $x_{\text{LU}}$ obtained by backsubstitution. So I am no longer as sure about numerical accuracy as a motivation for the LU decomposition as I used to be.

I did a few numerical experiments with random ill-conditioned matrices in Matlab, based on Druinsky and Toledo's code in Sec. 6 of their paper. It seems like the forward errors $\|x_{\text{inv}}-x\|$ and $\|x_{\text{LU}}-x\|$ were indeed usually quite similar, but the backward error $\|Ax_{\text{inv}}-b\|$ could be much bigger than $\|Ax_{\text{LU}}-b\|$. I also worked out a tiny example by hand:
$$\begin{align}
A &= \begin{bmatrix}1.00&2.01 \\ 1.01 & 2.03\end{bmatrix} \\
&= \underbrace{\begin{bmatrix}1.00 & 2.01 \\ 0 & -1.00\times10^{-4}\end{bmatrix}}_{L}\underbrace{\begin{bmatrix}1 & 0 \\ 1.01 & 1\end{bmatrix}}_{U} \\
&= {\underbrace{\begin{bmatrix}-2.03\times10^4 & 2.01\times10^4 \\ 1.01\times10^4 & -1.00\times10^4 \end{bmatrix}}_{A^{-1}}}^{-1},
\\
b &= \begin{bmatrix}1.01 \\ 1.02\end{bmatrix}.
\end{align}$$
The exact solution is $x=[-1, 1]^T$. Computing $A^{-1}b$ with three decimal digits of precision in the intermediate calculations yields the result $x_{\text{inv}} = [0, 0]^T$ due to catastrophic cancellation. The same precision in LU backsubstitution gives $x_{\text{LU}} = [1.01, 0]^T$. Both solutions clearly differ from the true solution $x$ by an error on the order of $10^0$, but $Ax_{\text{LU}}$ matches $b$ to numerical precision while $Ax_{\text{inv}}$ is totally off.

My questions are:

Is the explicit $2\times2$ example above representative of what happens in practice, or is it a meaningless example computed with too little precision that just coincidentally happens to match the other numerical experiments?
Are the numerical observations generally true? To make this precise, let $A=LU$ and define $V = U^{-1}*L^{-1}$, $x_{\text{inv}} = V*b$, and $x_{\text{LU}}= U^{-1}*(L^{-1}*b)$, where $*$ denotes multiplication with numerical roundoff error. Is it usually the case that $\|Ax_{\text{inv}}-b\|>\|Ax_{\text{LU}}-b\|$?
What are some examples of practical applications where forward error is more important than backward error, and vice versa?

Best Answer

Let $B * y$ denote matrix multiplication performed in finite precision arithmetics. Let $x_{LU}$ be the solution of $A x = b$ obtained by the LU method, and $x_{inv}$ be the solution computed as $x_{inv} = \text{inv}(A) * b$. Let $\text{inv}(A)$ denote the inverse of $A$ computed from the LU decomposition $A \approx \hat{L}\hat{U}$ by solving $Ax_j = e_j$. Note that $\text{inv}(A)$ is different from $\hat{U}^{-1} * \hat{L}^{-1}$ but is much simpler to analyse.

The forward error for $x_{LU}$ is $$\tag{1}\|x - x_{LU}\|_\infty \leq c_n u \||A^{-1}| |\hat{L}| |\hat{U}| |x_{LU}| \|_\infty \leq c_n u \||A^{-1}| |\hat{L}| |\hat{U}|\|_\infty \times \|x_{LU} \|_\infty$$

where $c_n$ denotes a constant of order $n$ and $u$ is the machine precision. Similar forward error bounds for $x_{inv}$ is $$\tag{2} \|x - x_{inv}\|_{\infty} \leq c_n u \||A^{-1}| |\hat{L}| |\hat{U}| |\text{inv}(A)| |b|\|_{\infty} \leq c_n u \||A^{-1}| |\hat{L}| |\hat{U}|\|_\infty \times \|\text{inv}(A)| |b|\|_{\infty}$$

Comparing (1) and (2) and noting, that for sufficiently well conditioned matrix $A$, $$\tag{3} \|x_{LU}\|_\infty\approx \|x\|_\infty = \|A^{-1}b\|_\infty \leq \||A^{-1}||b|\|_\infty\approx \||\text{inv}(A)| |b|\|_\infty$$ we see, that as long as $\|x\|_\infty\approx \|A^{-1}||b\|_\infty$, then $x_{LU}$ and $x_{inv}$ have similar forward error. Notice, that this condition holds for general $b$.

Therefore forward error bounds (1) and (2) always prefer $x_{LU}$ over $x_{inv}$ but usually the forward error for $x_{inv}$ is not much worse.

Error bound (1) can be found in Higham's "Accuracy and Stability of Numerical Algorithms". Error bound (2) can be computed from forward error bound for $\text{inv}(A)$: $$|\text{inv}(A) - A^{-1}| \leq c_n u |A^{-1}| |\hat{L}| |\hat{U}| |\text{inv}(A)|$$ and from observation, that error introduces by matrix multiplication $\text{inv}(A) * b$ is always smaller than error associated with computing $\text{inv}(A)$.

Now suppose, that $\bar{x}_{inv} = A^{-1} *b$, i.e. then inverse $A^{-1}$ is computed exactly and the only source of error is matrix multiplication. Then residuals $b - A\bar{x}_{inv}$ satisfy $$\tag{4} |b - A \bar{x}_{inv}| \leq c_n u |A| |A^{-1}| |b|$$ for $x_{LU}$ we have $$\tag{5}|b - A x_{LU}| \leq c_n u |\hat{L}| |\hat{U}| |x_{LU}|$$ Since it is usually true, that $\||\hat{L}| |\hat{U}|\|_\infty \approx \|A\|_\infty$, therefore $\bar{x}_{inv}$ and $x_{LU}$ can have comparable residual norms only when $\|x_{LU}\|_\infty \approx \||A^{-1}| |b|\|_\infty$. On the other hand, from (3) we have (for sufficiently well conditioned $A$), $\|x_{LU}\|_\infty \leq \||A^{-1}| |b|\|_\infty$, therefore residuals for $x_{LU}$ are usually smaller than residuals for $x_{inv}$ even for well conditioned matrix $A$. If $A$ has large condition number, then $\text{inv}(A)$ is inaccurate, we have additional source of error in (4), therefore we can expect much larger residuals for $x_{inv}$.

Therefore:

Presented $2\times 2$ example is generic, i.e. usually $x_{inv}$ and $x_{LU}$ have similar forward error (even for ill conditioned matrix $A$) unless $|x|_\infty \ll \||A^{-1}| |b|\|_\infty$.
Residuals for $x_{inv}$ is usually bigger than for $x_{LU}$ unless $A$ is well conditioned and $|x|_\infty \approx \||A^{-1}| |b|\|_\infty$.

I am not aware of any practical applications where minimizing the backward error is the ultimate goal. However the backward error and the forward error are closely related according to inequality $$\text{forward_error} \leq \text{condition_number} \times \text{backward_error}$$

Since condition number does not depend on algorithm used to solve given problem, one may hope, that choosing an algorithm will smaller backward error will lead to lower forward error as well.

Related Solutions

[Math] Solving a Conic Matrix given these Equations

Are these conditions enough to make Γ unique, assuming that Γ exists?

Hmmm. Still stuck for a counterexample. Since nobody answered, I'll hazard some initial thoughts.

1) The general solution should always be singular.

2) First, we need definitions sufficient to distinguish a solution.
Have you worked out the uniqueness condition for any individual cases? For example, consider a parabola with axis $ O + \lambda \vec x_1$, O on the Directrix. The center is $(x_c, y_c) =O\pm \infty \vec x_1.$
Definitions needed.

How do I go about solving for one permissible solution of $[A\;B\;C\;D\;E\;F]^⊺? $

3) Without additional information, I'd write out the general solution and plug in values I already know. What conditions resolve the scalar parameter(s)?

4) Break it up into subproblems. E.g.
i) (under what conditions) is the solution unique?
ii) what is the form of the general solution?
iii) How does it change when the origin is translated?.

In the matrix, the parameters $(x_c, y_c)$ complicate the solution. Set the origin to (0,0). All solutions which are unique for C=(0,0) will be unique for arbitrary C.

Formally, translate the coordinate origin to $(h, k)$ by $(\bar x, \bar y) = (x-h, y-k)$, and solve in coordinates $(\bar x, \bar y)$. (The math doesn't seem to mind if I leave the bars off.)
Then let $x_c, y_c$ be free parameters, and ask"How does a translation of origin affect the solution?"

I hope that helps. You might also consider abandoning the matrix. I think the tacit assumption
The condition swap is linear in $A ...F $ , and in some obvious way,
...is false. I've backed up to examining the conics case by case. I'll tackle it in more depth when time permits**

**I'm always available to solve problems together. I am r_k_h_i_r_s_t (at) gmail (Dot) com. Remove underscores. If the solution is not forthcoming, drop me a line.

[Math] Solving a complex, sparse, linear system using the Schur complement

Your approach looks good, so you may have a bug somewhere in your implementation. As a validation, I ran the following simple test case involving a randomly generated dense complex matrix $J$ and a randomly generated real right-hand side $b$.

N = 100;
M = 30;
J = exp(2i*pi*rand(N));
b = rand(N,1);
Jd = (J(M+1:end,M+1:end)' \ J(1:M,M+1:end)')';
x = b(1:M) - Jd*b(M+1:end);
Jd = J(1:M,1:M) - Jd*J(M+1:end,1:M);
x = Jd \ x;
y = b(M+1:end) - J(M+1:end,1:M)*x;
y = J(M+1:end,M+1:end) \ y;
z = [x; y];
ztrue = J \ b;
err = norm(ztrue - z) / norm(ztrue);
res = norm(J*z - b) / norm(b);
restrue = norm(J*ztrue - b) / norm(b);

Running this code multiple times the approximate averaged values for $\texttt{err}$, $\texttt{res}$, and $\texttt{restrue}$ were $1\times 10^{-13}$, $1\times 10^{-13}$, and $4\times 10^{-14}$, respectively.

Best Answer

Related Solutions

[Math] Solving a Conic Matrix given these Equations

[Math] Solving a complex, sparse, linear system using the Schur complement

Related Question