Optimization – How to Minimize the Maximum Absolute Difference Between Two Functions

How to minimize the maximum absolute difference between 2 functions?: example $\min_a\{\|\text{erf}(x)-\tanh(\frac2{\sqrt{\pi}}(x+a x^3))\|_\infty\}$

Intro_______________

In this other question I found by trial and error that the Error function could be approximated decently with the following Hyperbolic tangent function:
$$z(x)=\tanh\left(\frac{2}{\sqrt{\pi}}\left(x+\frac{11}{123}x^3\right)\right)$$

it keeps the absolute difference $|\text{erf}(x)-z(x)|<0.000358$. Also I compared in Wolfram-Alpha using $z(x)$ for taking probabilities of the standard Gaussian distribution, and the max mistake looks its below $0.018\%$.

The formula $z(x)$ comes from the series expansion of $\tanh^{-1}(\text{erf}(x))$ shown in Wolfram-Alpha, and just the first two terms makes already simple approximation than works quite good: $$f(x)=\tanh\left(\frac{2}{\sqrt{\pi}}\left(x+\frac{(4-\pi)x^3}{3\pi}\right)\right)$$

Here you could see them in Desmos.

The final formula is a modification by trial and error so the last fraction $11/123 \approx 0.0894308943$ surely could be improved. Numerically I have tried it to search it manually and I think the value is near $a \approx 0.089429822$ (caution here, this could be awfully wrong).

But by doing this, I figure out I have no clue how to solve it theoretically: trying some derivatives to match some first order conditions don't worked, and I don't know if there exists a method of equations in order to solve these kind of problems (like Euler-Lagrange Equations are used in calculus of Variations, as an example of a method of solving a minimization problem).

Main question___________

So if I have these two functions $f(x) = \text{erf}(x)$ and $g(x,a) = \tanh\left(\frac{2}{\sqrt{\pi}}\left(x+ax^3\right)\right)$ and I want to find some parameter "$a$" such it minimizes the maximum absolute difference between these two functions:
$$\min_{a}\left\{ \|f(x)-g(x,a)\|_\infty\right\} \equiv \min_{a}\left\{ \left\|\text{erf}(x)-\tanh\left(\frac{2}{\sqrt{\pi}}\left(x+ax^3\right)\right)\right\|_\infty\right\}$$

How do I solve theoretically these kind of problems? There is a classic framework to use? I Googled for it but I found little about it (maybe I used the wrong terms), Does this problem have a specific name?
How do I solve theoretically this specific example? Please be detailed if possible.
Which is the value of the optimal $\hat{a}$ in this specific example? Hopefully theoretically obtained, like having a closed-form through known constants if possible.

Added later

After the comment by @Claude looks like the problem is not trivial at all, and there are only numerical approximations to solve it (which is sad). So I am willing to give the answer to the best numerical result of the approximate of the optimal $\hat{a}$.

So far, my best attempt is $\hat{a}=\frac{302\pi}{10609}$. Please take into account I am looking for simple formulas, everything more complicated than typing the example equation will be dismissed for the formula for this optimal constant $\hat{a}$ (I know beforehand this is very subjective, so I hope you get for what I am aimmimg for).

2nd Added later

As a reference, by playing with Desmos I have some insight about the optimal value should be:
$$\frac{302\pi}{10609}<\hat{a}<\frac{11}{123} $$

As example, $a=\frac{86\pi}{3021}$ gives an error of $0.000356924$.

Best attempt so far:

$\hat{a}\approx \frac{314}{3511}$ which gives a figure of $0.000356851$

(Caution here: Desmos show different results from those shown in Wolfram-Alpha. I am using Wolfram-Alpha as benchmark, but I don't have any good reason supporting one from another).

Out of question

I found also by trial and error than:
$$\tanh\left(\frac{2}{\sqrt{\pi}}\left(\frac{841}{840}x+\frac{16}{181}x^3\right)\right)\to \text{max error}=0.000292772$$

and also that
$$\tanh\left(\frac{2}{\sqrt{\pi}}\left(x+\frac{28}{305}x^3\sqrt{1-\frac{x^2}{33}}\right)\right)\to \text{max error}=0.0000551322$$

which have figures similar or better than many of the more complex numerical approximations shown in Wikipedia page for the Error function: Numerical approximations.

I think is interesting since the coefficients are debiations from the traditional series expansions like Taylor series, but also since the similarity between the $\text{erf}(x)$ and $\tanh(x)$ functions, I have doubts about their absence of the Wikipedia approximations (the simple "$11/123$" version was uploaded by me a few days ago): Does anybody knows why is that so?, Are there any reasons that discard the hyperbolic tangents for approximating the Error function?

Final update

After the comment by @njuffa I have started realizing that due numerical issues on accuracy I don't have an objective way for comparing solutions near $11/123$ since Desmos and Wolfram-Alpha gives different results, as also the optimization figures used by everyone that have shared their interesting answers or attempts as comments. Maybe it is as so because on how each software makes their numerical implementations of the Error and Hyperbolic tangent functions, on how they numerically integrate,
numerical optimization methods considered, floating point issues, or all of above – but I don't know for sure.

I got tired of making failed attempts for solving it in closed-form, so I have choosen @JaiB answer as correct just because I learned there about the existence of the "Rationalize" function in Wolfram-Alpha.

Since I don't have a objectively way to do it, I want to thank also @ClaudeLeivovici for his detailed answer, as also everyone else who participated in the comments (@Claude for explaining it was a hard theoretical problem, @RiverLi for correcting my typo at the beginning, and @njuffa for also sharing an answer in the comments).

Best Answer

First we look at a plot of the deviations for values of $x$ given that $a=11/123$:

Plot[Abs[Erf[x] - Tanh[2 x (1 + (11/123) x^2)/Sqrt[π]]], {x, -3, 3},Frame -> True, 
 FrameLabel -> (Style[#, 18, Bold] &) /@ {"x", "Absolute error"},
 PlotRange -> {All, {0, 0.0004}}, 
 PlotLabel -> Style["a = 11/123", 24, Bold]]

So the absolute error is symmetric around 0 and there are just the two peaks around x = 0.8 and x = 1.8 that we need to deal with. A function can be written (I' ve used Mathematica but I' m sure that Maple and MATLAB work fine, too) that allows you to control the precision of the numerical values.

For a given value of $a$ near 11/123 the curves have similar shapes and the two peaks around the same values. A function with a precision value of "50" (i.e., roughly 50 significant digits) one can use the following:

f[x_, a_] := Abs[Erf[x] - Tanh[2 x (1 + a x^2)/Sqrt[π]]]
maxDeviation[a_?NumericQ] := 
 Max[FindMaximum[f[x, a], {x, 9/10}, WorkingPrecision -> 50][[1]],
  FindMaximum[f[x, a], {x, 18/10}, WorkingPrecision -> 50][[1]]]
sol = FindMinimum[maxDeviation[a], {a, 894/10000}, 
  WorkingPrecision -> 50]
(* {0.00035784262189043944704023263051447980100361414499527, 
   {a -> 0.089429822455264989912596353788697817245378987726119}} *)

If you desire a "rational" number for $a$ to make inputting that constant easier, then the Rationalize function can do that for you. If you want the rational number to be accurate to 10^-8, then the following will work

Rationalize[a /. sol[[2]], 10^-8]
(* 676/7559 *)

Best Answer

Related Solutions

Writing the “error function” $\text{erf}(x) = \frac{1}{\sqrt{2\pi}} \int_0^x e^\frac{-t^2}{2} dt$ as power series

Nice result that I can’t prove: $\int_{-2}^{2} \tan^{-1} \bigg( \exp(-x²\text{erf}(x)) \bigg) \;dx=\pi$

Related Question