Is there a stable algorithm for every well-conditioned problem

algorithmscondition numberfloating pointnumerical linear algebranumerical methods

Reading these notes on condition numbers and stability, the summary states:

  • If the problem is well-conditioned then there is a stable way to solve it.
  • If the problem is ill-conditioned then there is no reliable way to solve it in a stable
    way.

The second statement is easy to prove and is roughly done so in this answer, which establishes that if the input to a function is not exact (and therefore includes a rounding error) then that error will be amplified according to it's condition number.

The first statement seems quite surprising, however, although believable. It is well known (including in the referenced notes) that there exist well-conditioned problems for which the obvious algorithm is unstable, but which can be made stable through particular algebraic tricks. Can it be proven that this is always possible, and if so does this imply that it may be possible to solve for a stable algorithm given any well-conditioned mathematical expression? One would surely think that if such a thing were possible many programmers and numerical analysts would be out of a job!

To better define the problem, assume that overflow and underflow do not occur (which is relatively easy to ensure for well-conditioned problems) and that the condition number of the problem is approximately 1.

I did find this question asked previously, however it seems that the title is not as clear and it was significantly reworded after posting. In the comments, user Carl Christian states "In principle we can approximate a smooth function using piecewise polynomials, but only if we can compute accurate approximations at any necessary point, a statement which is not very satisfying either" which seems like a promising lead. If there exists a stable algorithm solving any Taylor polynomial, then perhaps that is sufficient?

Best Answer

The second statement is false as I shall demonstrate now. Consider the problem of computing the function $f : \mathbb{R} \times \mathbb{R} \rightarrow \mathbb{R}$ given by $$f(x,y) = x - y.$$ I will assume that we are using floating point numbers that respect the IEEE standard. I will assume that $x-y$ is in the representable range. Then there exists a $\delta$ such that the computed value $\hat{z}$ of the real number $z = x - y$ satisfies $$\hat{z} = (x-y)(1+\delta)$$ and $|\delta| \leq u$ where $u$ is the unit roundoff. We observe that $$\hat{z} = f(\bar{x}, \bar{y}),$$ where $$ \bar{x} = x(1+\delta), \quad \bar{y} = y(1+\delta).$$ This shows that the computed value of $z$ is the exact value of $f$ evaluated at a point $(\bar{x},\bar{y})$ which satisfies $$|x-\bar{x}| \leq u |x|, \quad |y-\bar{y}| \leq u |y|.$$ This shows that the naive algorithm for computing $z$ is componentwise relative backwards stable. This the highest and most demanding form of stability.

I shall now show that problem of computing $z$ is ill-conditioned in the relative sense when $x \approx y$ and well-conditioned in the relative sense when $x$ and $y$ are not close. Specifically, suppose that we desire to compute $z=x-y$, but we only have good approximations $$\hat{x} = x(1+\alpha), \quad \hat{y} = y(1+\beta)$$ of $x$ and $y$, then we cannot hope to do better than compute $\hat{x} - \hat{y}$. The relative error between the number we want and the best approximation that we can hope to compute satisfies

$$\left|\frac{(x-y) - (\hat{x} - \hat{y})}{(x-y)} \right| \leq \max\{|\alpha|,|\beta|\}\frac{|x| + |y|}{|x-y|}$$

Small values of $|\alpha|$ and $|\beta|$ reflect the fact that $\hat{x}$ and $\hat{y}$ are good approximations of $x$ and $y$, but the right-hand side will be large when $x \approx y$. In this case we have no guarantee that the relative error will be small, and in general it will be large. On the other hand if $|x| \ge 2|y|$ or $|y| \ge 2|x|$, then the triangle inequalities implies that $$ \frac{|x| + |y|}{|x-y|} \leq 3$$ and we have a guarantee that the relative error will be at most 3 times the largest relative error associated with either $x$ or $y$.

In summary, the naive algorithm for computing $f(x,y) = x - y$ is always backwards stable in the componentwise relative sense, but the problem is ill-conditioned for some input and well-conditioned for most input.

In general, there is no connection between the condition of a function $g$ and the stability of the algorithm that is used to evaluate $g$. This is a good thing! It dramatically simplifies the analysis of the software that implements the algorithm. First we analyze the function $f$. If $f$ is always well-conditioned, then a stable algorithm should return a good approximation of $y = f(x)$ when supplied with a good approximation $\hat{x} \approx x$. If there are points $x$ where $f$ is ill-conditioned, then even a stable algorithm cannot hope to return an accurate approximation of $y = f(x)$, because we are unable to provide the machine with the exact value of $x$. In general, we cannot hope to do better than $\hat{x} = \text{fl}(x)$, the floating-point representation of $x$.

I shall now demonstrate that the first statement is false. Let $f : \mathbb{R}^{n} \times \mathbb{R}^n \rightarrow \mathbb{R}^{n \times n}$ denote the outer product, i.e., $$f(x,y) = xy^T.$$

I shall now demonstrate that $f$ is well-conditioned, but that no algorithm for computing $f$ can be backwards stable.

We begin by exploring the sensitivity of $f$ to small perturbations of the input. Let $\epsilon>0$ be a small fixed number and let $\Delta x$ and $\Delta y$ be perturbations of $x$ and $y$ such that $$ |\Delta x| \leq \epsilon |x|, \quad |\Delta y| \leq \epsilon |y|$$ These inequalities should be understood in the componentwise sense, i.e., $$ |(\Delta x)_i | \leq \epsilon |x_i|$$ and similarly for $y$. Then $$ f(x + \Delta x, y + \Delta y) = xy^T + x(\Delta y)^T + (\Delta x)y^T + (\Delta x)(\Delta y)^T.$$ It follows that $$ f(x + \Delta x, y + \Delta y) - f(x,y) = x(\Delta y)^T + (\Delta x)y^T + (\Delta x)(\Delta y)^T$$ We conclude that $$ |f(x + \Delta x, y + \Delta y) - f(x,y)| \leq 2 \epsilon |x||y|^T + \epsilon^2|x||y|^T = (2 \epsilon + \epsilon^2) |xy|^T = (2 \epsilon + \epsilon^2)|f(x,y)|$$ Again, this inequality should be understood in the componentwise sense. In short, for every single component of $f$, then error caused by perturbing the input argument is small relative to the corresponding component of $f$. We conclude that $f$ is well-conditioned with respect to perturbations that are small in the componentwise relative sense.

Now suppose that $\hat{z}$ is the computed value of the matrix $z = f(x,y) = xy^T$. In general, we cannot have $\hat{z} = f(\bar{x},\bar{y})$! Why is that? Well, if $x, y \in \mathbb{R}^n$, then we have only $2n$ degrees of freedom, but there are $n^2$ equations to satisfy. We have to find $\Delta x$ and $\Delta y$ such that $$\hat{z} = (x+\Delta x)(y + \Delta y)^T$$ and in general, this is not possible. In particular, there is no hope of $\Delta x$ and $\Delta y$ that are small relative to $x$ and $y$. We conclude that no algorithm for computing $f$ can be backwards stable in the componentwise relative sense.

I recommend that you set the note aside. These are the two best references that I know

  1. The paper "Mixed, componentwise and structured condition numbers" by Israel Gohberg and Israel Koltracht. SIAM Journal of Matrix Analysis and Applications, 1993, Vol. 14, No. 3 : pp. 688-704 https://doi.org/10.1137/0614049. The treatment of condition numbers is very systematic.
  2. The textbook "Accuracy and stability of numerical algorithms" by Nichlas J. Higham. SIAM 2002. https://doi.org/10.1137/1.9780898718027. This is a good book on many different topics. The treatment of the conditioning of linear systems is excellent.

They are not easy but they are good. Many of the answers that I have provided here are derived from these texts.

EDIT: I am not sure that it is possible to repair the statements in the note. I am sure that it the following is safe to teach:

If the problem is well-conditioned, then a stable algorithm will produce an accurate result. If the problem is ill-conditioned, then we cannot expect that a stable algorithm will produce an accurate result.

I can also phase it like this: If the problem is well-conditioned, then it is not theoretically impossible to write a computer program that returns accurate results. If the problem is ill-conditioned, then we cannot expect that any computer program will return accurate results.

Related Question