Cubics – Numerically Stable Scheme for the 3 Real Roots of a Cubic

cubicsfloating point

The text book solution is utterly well-know but I will succinctly write it down to fix the notations. The solution of

$$-x^3+3px+2q=0,$$

when $p^3-q^2>0$ are 3 real roots given by

$$x = 2\sqrt{p}\cos\left(\theta+\frac{2n\pi}{3}\right)$$

for $n=0,1,2$, with

$$\theta = \frac{1}{3}\arctan\frac{\sqrt{p^3-q^2}}{q},$$

with the obvious limits when $q\to0^\pm$.

The problem is obviously that the formula for $\theta$ is not numerically robust, as it may produce a detrimental cancellation in the subtraction under the square root. So my question is: can this be modified into a stable scheme?

Best Answer

The ten-operation sequence in the pseudo-code below is inspired by the computation of $ab - cd$ in the following paper:

Claude-Pierre Jeannerod, Nicolas Louvet, and Jean-Michel Muller, "Further Analysis of Kahan's Algorithm for the Accurate Computation of $2 \times 2$ Determinants", Mathematics of Computation, Vol. 82, No. 284, Oct. 2013, pp. 2245-2264 (online)

The algorithm makes use of the fused multiply-add, or fma, operation to compute the difference of two products robustly in just four operations:

w := d * c
e := fma (-d, c, w)
f := fma (a, b, -w)
r := f + e

That algorithm is trivially applicable for the computation of the discriminant when computing the real solutions of a quadratic equation. The use of fma allows for efficient implementations of error-free transformations, which result in a head-tail (or double-native) representation of operands that in essence provides twice the native precision. The above algorithm provides for an error bound of 1.5 ulp, provided no overflow or underflow occurs in intermediate computation.

For the computation of $p^{3} - q^{2}$ the above algorithm needs modification to incorporate the cubing, where it is preferred not to explicitly compute it as triple-native precision operand. The sequence below is the most efficient version I managed to create on the double; the worst case error does not appear to exceed 17 ulps as determined by reasonably extensive testing with test cases in which $p^{3}$ is very close in magnitude to $q^{2}$. Higher-accuracy versions at potentially reduced performance are undoubtedly possible.

In the code annotations w:e, s:t, u:v designate head-tail pairs of native precision numbers that represent the exact product of two native-precision operands when summed.

When this pseudo-code is transformed into actual programming code processed by a compiler, it is essential that the compiler is configured to implement the sequence of operations as written. Many compilers will re-associate floating-point expressions at high optimization levels, which may even be the default. The strictest floating-point evaluation mode available should be selected. With the Intel compilers, this is /fp:strict or -fp-model=strict depending on operating system platform.

// w:e = -q*q exactly
w := -q * q
e := fma (-q, q, -w)
// s:t =  p*p exactly
s := p * p
t := fma (p, p, -s)
// s:t * p + w:e = s*p + w + t*p +e = s*p+w + u:v + e = f + u:v + e
f := fma (s, p, w)
u := t * p
v := fma (t, p, -u)
// sum all terms into final result
r := ((e + f) + u) + v