Approximating Error Function erf(x) Using Hyperbolic Tangent Function

approximationerror functionhyperbolic-functionsprobabilityrecreational-mathematics

Approximating for the Error function $\text{erf}(x)$ through an Hyperbolic tangent function $\text{tanh}\left(\dfrac{4x}{4-x^2}\right)$

I was plotting some functions and I found that the function
$$f(x) = \begin{cases} -1,\quad x\leq -2\\ 1, \quad x\geq 2 \\ \text{tanh}\left(\dfrac{4x}{4-x^2}\right),\, -2<x<2\end{cases}$$
"looks" very similar to the graph of the Error function as it is shown in Wolfram-Alpha:

Wolfram Alpha plot

But looking into the wikipedia page for the Error function this approximation is not listed, so I guess that regardless from the similarity in the plot, $f(x)$ it is considered as a "bad approximation":

Why it is considered a poor approximation?

Also, a simpler version of the Hyperbolic tangent function could fit even better as approximation:
$$g(x) = \text{tanh}\left(\dfrac{11}{9}x\right)$$

But no relation with the Hyperbolic tangent function is listed in Wikipedia, so

Why hyperbolic tangents are considered bad approximations for the error function?

Here I left the plots in Desmos:

Desmos' graph


Added later (after some answers)

After 2 interesting answers, I got the idea of testing the series expansion of $\tanh^{-1}(\text{erf}(x))$ shown in Wolfram-Alpha, and just the first 2 terms makes a simple approximation than works quite good: $$f(x)=\tanh\left(\frac{2}{\sqrt{\pi}}\left(x+\frac{(4-\pi)x^3}{3\pi}\right)\right)$$

Here you could see it in Desmos where the maximum aplitude difference is lower than $0.0007$. Also note that don't requires to be defined as a piecewise function.

Does this approx. good enough for approximating probabilities?

best attempt so far

Even since after these 2 first terms the Taylor expansion start to converge more slowly, by sacrificing accuracy near $x=0$ (since is symmetric), one could find approximations that reduce the maximum amplitude differences, and also, for making it having fewer terms I have choosen the following value (arbitrarily by trial and error):

$$g(x)=\tanh\left(\frac{2}{\sqrt{\pi}}\left(x+\frac{\pi}{35}x^3\right)\right)$$

which keeps the amplitude differences below $0.0005$.

I don't know How to measure if it will made too much mistakes if I use $g(x)$ instead of the Standard Gaussian CDF for taking probabilities, What do you think?

another good attempt


my last attempt

By trial and error I found that:
$$z(x)=\tanh\left(\frac{2}{\sqrt{\pi}}\left(x+\frac{11}{123}x^3\right)\right)$$

keeps the difference $|\text{erf}(x)-z(x)|<0.00036$. Maybe someone could find an optimal $\hat{a}$ such it makes the best fit possible for $\text{erf}(x)$ through $\tanh\left(\frac{2}{\sqrt{\pi}}\left(x+\hat{a}x^3\right)\right)$

Also I compared in Wolfram-Alpha using $z(x)$ for taking probabilities of the standard Gaussian distribution, and the max mistake looks its below $0.018\%$, quite accurate!.

Best Answer

The approximation is bad because the tails are totally different.

The standard normal density is proportional to $e^{-x^2}$. The hyperbolic tangent, whose derivative is the square of the hyperbolic secant, is proportional to $(e^x + e^{-x})^{-2}$. For "large" $x$, this means the latter will be about $e^{-2x}$, but the former will be $e^{-x^2} \ll e^{-2x}$. As a result, the tails are much heavier for a density based on the hyperbolic tangent.

Let's denote $F(x) = \operatorname{erf}(x)$ and $G(x) = \tanh \frac{2x}{\sqrt{\pi}}$. Then $$f(x) = F'(x) = \frac{2}{\sqrt{\pi}}e^{-x^2}, \quad g(x) = G'(x) = \frac{2}{\sqrt{\pi}} \operatorname{sech}^2 \frac{2x}{\sqrt{\pi}}.$$

Here is a plot of $f/g$: enter image description here The error is between $1$ and $1.040703873959\ldots$. Note that it is always an overestimate; i.e., $F > G$ for all $x > 0$. You might look at this and think, "a maximum of 4% error is not that bad."

But here is a plot of $f/g$: enter image description here And this is much, much worse. As mentioned earlier, the tails are not comparable, which is why $\lim_{x \to \infty} \frac{f}{g} = 0$. In order to be a "good" approximation, the ratio of densities should be close to $1$ across the support. To see how much better we can do, consider the Bürmann series $$B(x) = \frac{2}{\sqrt{\pi}} \operatorname{sgn}(x) \sqrt{1-e^{-x^2}} \left(1 - \frac{1}{12}(1 - e^{-x^2}) - \frac{7}{480}(1 - e^{-x^2})^2 - \frac{5}{896} (1 - e^{-x^2})^3 - \cdots \right)$$ for which the first four terms yields the following plot of $F/B$: enter image description here And you can easily tell this is far superior. The plot of the ratio of derivatives $f/b$ is enter image description here It's not perfect by any means but it is clearly superior to the hyperbolic tangent. With more terms, it will improve further.