[Math] efficient and accurate approximation of error function

approximationreference-requestspecial functions

I am looking for the numerical approximation of error function, which must be efficient and accurate. Thanks in advance
$$\mathrm{erf}(z)=\frac2{\sqrt\pi}\int_0^z e^{-t^2} \,\mathrm dt$$

Best Answer

The oldest approximations to the error function $\mathrm{erf}$ encountered in existing software come from:

Cecil Hastings, Jr., Approximations for Digital Computers, Princeton, NJ: Princeton University Press 1955, pp. 167-169, pp. 185-187

Hastings gives six minimax approximations designed for absolute error bounds, the most accurate of which bounds the error by $1.5 \times {10}^{-7}$. Four of the approximations were later incorporated into:

Milton Abramowitz & Irene A. Stegun (eds.), Handbook of Mathematical Functions, Washington, D.C.: National Bureau of Standards 1964, sections 7.1.25 - 7.1.28 (online)

The first approximations designed with a relative error bound were published by William J. Cody:

William J. Cody, "Rational Chebyshev approximations for the error function." Mathematics of Computation, Vol. 23, No. 107, July 1969, pp. 631-637 (online)

Cody used these rational minimax approximations as the basis for a Fortran implementation of the error function $\mathrm{erf}$, complementary error function $\mathrm{erfc}$ and the scaled complementary error function $\mathrm{erfcx}$ which he contributed to SPECFUN (online) during the 1980s, and improved until 1990. This work is described in:

William J. Cody, "Performance Evaluation of Programs for the Error and Complementary Error Functions." ACM Transactions on Mathematical Software, Vol. 16, No. 1, March 1990, pp. 29-37

In testing with 500 million random test vectors I found no errors larger than 6.03 ulps in Cody's SPECFUN implementation of $\mathrm{erf}$.

Around the same time Stephen Moshier released the Cephes library (online), which he maintained well into the 2000s. This library is written in C and contains an implementation of $\mathrm{erf}$ that is based on rational approximations as well. The library was documented in:

Stephen Lloyd Baluk Moshier, Methods and Programs for Mathematical Functions, Ellis Horwood Limited, Chichester, 1989

Testing with 500 million random test vectors I found no errors in excess of 2.95 ulps in Moshier's Cephes implementation of $\mathrm{erf}$.

In 1993 Sun Microsystems, Inc released the Freely Distributable Math Library, or FDLIBM, which is based on the work of K. C. Ng. This library, written in C, aimed to provide a faithfully-rounded implementation of $\mathrm{erf}$ (online), based on rational minimax approximations. A faithfully-rounded function has an error bound of 1 ulp or less. The result returned is either the floating-point number closest to the mathematical result, or its immediate neighbor.

Based on my tests with 500 million random test vectors, the FDLIBM implementation falls slightly short of its accuracy goal, in that errors in excess of 1 ulp do occur but no errors larger than 1.01 ulp were observed.

Using modern tools and processor architectures, it is not too hard to create one's own competitive implementation of $\mathrm{erf}$. Rational approximations have largely fallen out of favor because of the high computational cost of division compared with additions and multiplications. Polynomial minimax approximations are desired, and can readily be generated with tools such as Maplesoft's Maple and Wolfram's Mathematica, both of which have dedicated functions for this: the numapprox command in Maple and the MiniMaxApproximation command in Mathematica.

It should be noted that these generate minimax approximations without taking into account that the coefficients will be rounded to target precision, e.g. IEEE-754 double precision. The Sollya tool with its fpminimax command is unique in offering that functionality, it typically generates superior approximations.

To my knowledge, all these tools use variants of the Remez algorithm which may be augmented by additional heuristics. It is quite possible to write one's own implementation of this algorithm, this basically requires a solver for systems of linear equations and an arbitrary-precision library. I use Richard Brent's MP library for the latter, but would not recommended it for new software since the design is outdated.

Looking at the graph of $\mathrm{erf}$ we find that it is symmetric about the origin, so approximations can be restricted to the positive half-plane. The graph further suggest two basic approximation intervals. For smaller arguments, the function closely follows its tangent through the origin, while for larger arguments it approaches $y = 1$ in the shape of exponential decay. The crossover between the two regions is roughly around $x = 1$. For simplicity, we might choose that as the actual crossover point.

The series expansion for $\mathrm{erf}$ suggests a polynomial approximation of the form $x p({x}^{2})$ for $x \in [0, 1]$, while a quick numerical check shows that for IEEE-754 double precision, erf(x) == 1.0 holds for $x > 5.921875$, so we can use an approximation $1 - \exp (x q(x))$ for $x \in (1, 5.921875]$; $p$ and $q$ are polynomials. Many software environments offer a function expm1 for computing $\exp(x)-1$ which can be used to achieve a modest improvement in accuracy over a verbatim implementation.

The fused-multiply add operation (FMA) enables the evaluation of the polynomials with improved accuracy, by reducing rounding error and providing protection against subtractive cancellation, as the intermediate product is not rounded. An FMA operation is provided by most modern processor architectures, and is exposed in ISO C/C++ via the standard math function fma(). Use of the Horner scheme is beneficial for the accuracy of polynomial evaluation; for increased performance by means of increased instruction level parallelism on modern processor architectures the use of some form of second-order Horner scheme can be helpful.

Below is exemplary C code for a IEEE-754 double-precision implementation that follows the design guidelines sketched above. The accuracy of this implementation depends somewhat on the accuracy of the exponential functions. With faithfully-rounded implementations of exp() and expm1() I observed no errors in excess of 1.054 ulp using 1 billion random test vectors when USE_EXPM1=0, and no errors in excess of 1.020 ulp when USE_EXPM1=1. With some tweaking, a faithfully-rounded approximation should be possible, as I demonstrated for equivalent single-precision code.

double my_erf (double a)
{
    double r, s, t, u;
    
    t = fabs (a);
    s = a * a;
    if (t >= 1.0) {
        // max ulp error = 0.97749 (USE_EXPM1 = 1); 1.05364 (USE_EXPM1 = 0)
        r = fma (-5.6271698391213282e-18, t, 4.8565951797366214e-16); // -0x1.9f363ba3b515dp-58, 0x1.17f6b1d68f44bp-51
        u = fma (-1.9912968283386570e-14, t, 5.1614612434698227e-13); // -0x1.66b85b7fbd01ap-46, 0x1.22907eebc22e0p-41
        r = fma (r, s, u);
        u = fma (-9.4934693745934645e-12, t, 1.3183034417605052e-10); // -0x1.4e0591fd97592p-37, 0x1.21e5e2d8544d1p-33
        r = fma (r, s, u);
        u = fma (-1.4354030030292210e-09, t, 1.2558925114413972e-08); // -0x1.8a8f81b7e0e84p-30, 0x1.af85793b93d2fp-27
        r = fma (r, s, u);
        u = fma (-8.9719702096303798e-08, t, 5.2832013824348913e-07); // -0x1.8157db0edbfa8p-24, 0x1.1ba3c453738fdp-21
        r = fma (r, s, u);
        u = fma (-2.5730580226082933e-06, t, 1.0322052949676148e-05); // -0x1.595999b7e922dp-19, 0x1.5a59c27b3b856p-17
        r = fma (r, s, u);
        u = fma (-3.3555264836700767e-05, t, 8.4667486930266041e-05); // -0x1.197b61ee37123p-15, 0x1.631f0597f62b8p-14
        r = fma (r, s, u);
        u = fma (-1.4570926486271945e-04, t, 7.1877160107954648e-05); // -0x1.319310dfb8583p-13, 0x1.2d798353da894p-14
        r = fma (r, s, u);
        u = fma ( 4.9486959714661590e-04, t,-1.6221099717135270e-03); //  0x1.037445e25d3e5p-11,-0x1.a939f51db8c06p-10
        r = fma (r, s, u);
        u = fma ( 1.6425707149019379e-04, t, 1.9148914196620660e-02); //  0x1.5878d80188695p-13, 0x1.39bc5e0e9e09ap-6
        r = fma (r, s, u);
        r = fma (r, t, -1.0277918343487560e-1); // -0x1.a4fbc8f8ff7dap-4
        r = fma (r, t, -6.3661844223699315e-1); // -0x1.45f2da3ae06f8p-1
        r = fma (r, t, -1.2837929411398119e-1); // -0x1.06ebb92d9ffa8p-3
        r = fma (r, t, -t);
#if USE_EXPM1
        r = expm1 (r);
#else // USE_EXPM1
        r = 1.0 - exp (r);
#endif // USE_EXPM1
        r = copysign (r, a);
    } else {
        // max ulp error = 1.01912
        r =            -7.7794684889591997e-10; // -0x1.abae491c44131p-31
        r = fma (r, s,  1.3710980398024347e-8); //  0x1.d71b0f1b10071p-27
        r = fma (r, s, -1.6206313758492398e-7); // -0x1.5c0726f04dbc7p-23 
        r = fma (r, s,  1.6447131571278227e-6); //  0x1.b97fd3d9927cap-20 
        r = fma (r, s, -1.4924712302009488e-5); // -0x1.f4ca4d6f3e232p-17
        r = fma (r, s,  1.2055293576900605e-4); //  0x1.f9a2baa8fedc2p-14
        r = fma (r, s, -8.5483259293144627e-4); // -0x1.c02db03dd71bbp-11
        r = fma (r, s,  5.2239776061185055e-3); //  0x1.565bccf92b31ep-8
        r = fma (r, s, -2.6866170643111514e-2); // -0x1.b82ce311fa94bp-6
        r = fma (r, s,  1.1283791670944182e-1); //  0x1.ce2f21a040d14p-4
        r = fma (r, s, -3.7612638903183515e-1); // -0x1.812746b0379bcp-2
        r = fma (r, s,  1.2837916709551256e-1); //  0x1.06eba8214db68p-3
        r = fma (r, a, a);
    }
    return r;
}
```
Related Question