[Math] What are ways to compute polynomials that converge from above and below to a continuous and bounded function on $[0,1]$

approximation-theorypolynomialsprobabilitysimulationuniform-convergence

Background

We're given a coin that shows heads with an unknown probability, $\lambda$. The goal is to use that coin (and possibly also a fair coin) to build a "new" coin that shows heads with a probability that depends on $\lambda$, call it $f(\lambda)$. This is the Bernoulli factory problem, and it can be solved only for certain functions $f$. (For example, flipping the coin twice and taking heads only if exactly one coin shows heads, we can simulate the probability $2\lambda(1-\lambda)$.)

Specifically, the only functions that can be simulated this way are continuous and polynomially bounded on their domain, and map $[0, 1]$ or a subset thereof to $[0, 1]$, as well as $f=0$ and $f=1$. These functions are called factory functions in this question. (A function $f(x)$ is polynomially bounded if both $f$ and $1-f$ are bounded below by min($x^n$, $(1-x)^n$) for some integer $n$ (Keane and O'Brien 1994). This implies that $f$ admits no roots on (0, 1) and can't take on the value 0 or 1 except possibly at 0 and/or 1.)

In this question, a polynomial $f(x)$ is written in Bernstein form of degree $n$ if it is written as— $$f(x)=\sum_{k=0}^n a_k {n \choose k} x^k (1-x)^{n-k},$$ where $a_0, …, a_n$ are the polynomial's Bernstein coefficients.

Polynomials that approach a factory function

An algorithm simulates a factory function via two sequences of polynomials that converge from above and below to that function. Roughly speaking, the algorithm works as follows:

  1. Generate U, a uniform random number in $[0, 1]$.
  2. Flip the input coin (with a probability of heads of $\lambda$), then build an upper and lower bound for $f(\lambda)$, based on the outcomes of the flips so far. In this case, these bounds come from two degree-$n$ polynomials that approach $f$ as $n$ gets large, where $n$ is the number of coin flips so far in the algorithm.
  3. If U is less than or equal to the lower bound, return 1. If U is greater than the upper bound, return 0. Otherwise, go to step 2.

The result of the algorithm is 1 with probability exactly equal to $f(\lambda)$, or 0 otherwise.

However, the algorithm requires the polynomial sequences to meet certain requirements; among them, the sequences must be of Bernstein-form polynomials that converge from above and below to a factory function. See the formal statement, next.

Formal Statement

More formally, for the approximation schemes I am looking for, there exist two sequences of polynomials, namely—

  • $g_{n}(\lambda): =\sum_{k=0}^{n}a(n, k){n \choose k}\lambda^{k}(1-\lambda)^{n-k}$, and
  • $h_{n}(\lambda): =\sum_{k=0}^{n}b(n, k){n \choose k}\lambda^{k}(1-\lambda)^{n-k}$,

for every integer $n\ge1$, such that—

  1. $0\le a(n, k)\le b(n, k)\le1$,
  2. $\lim_{n}g_{n}(\lambda)=\lim_{n}h_{n}(\lambda)=f(\lambda)$ for every $\lambda\in[0,1]$, and
  3. For every $m<n$, both $(g_{n} – g_{m})$ and $(h_{m} – h_{n})$ have non-negative coefficients once $g_{n}$, $g_{m}$, $h_{n}$, and $h_{m}$ are rewritten as degree-$n$ polynomials in Bernstein form,

where $f(\lambda)$ is continuous on $[0, 1]$ (Nacu and Peres 2005; Holtz et al. 2011), and the goal is to find the appropriate values for $a(n, k)$ and $b(n, k)$.

It is allowed for $a(n, k)\lt0$ for a given $n$ and some $k$, in which case all $a(n, k)$ for that $n$ are taken to be 0 instead. It is allowed for $b(n, k)\gt1$ for a given $n$ and some $k$, in which case all $b(n, k)$ for that $n$ are taken to be 1 instead.

Alternatively, find a way to rewrite $f(\lambda)$ as— $$f(\lambda) = \sum_{n\ge 1} P_n(\lambda) = 1 – \sum_{n\ge 1} Q_n(\lambda),$$ where $P_n$ and $Q_n$ are polynomials of degree $n$ with non-negative Bernstein coefficients.

A Matter of Efficiency

However, ordinary Bernstein polynomials can't in general converge to a function faster than $O(1/n)$, a result known since Voronovskaya (1932) and a rate that will lead to an infinite expected number of coin flips in general. (See also the answer below.)

But Lorentz (1966) showed that if the function is positive and $C^k$ continuous, there are polynomials with non-negative Bernstein coefficients that converge at the rate $O(1/n^{k/2})$ (and thus can enable a finite expected number of coin flips if the function is "smooth" enough).

Thus, people have developed alternatives, including iterated Bernstein polynomials, to improve the convergence rate. These include Micchelli (1973), Guan (2009), Güntürk and Li (2021), the "Lorentz operator" in Holtz et al. (2011), and Draganov (2014).

These alternative polynomials usually include results where the error bound is the desired $O(1/n^{k/2})$, but all those results (e.g., Theorem 4.4 in Micchelli; Theorem 5 in Güntürk and Li) have hidden constants with no upper bounds given (**), making them unimplementable (that is, it can't be known beforehand whether a given polynomial will come close to the target function within a user-specified error tolerance).

Questions

Thus the questions are:

  1. Are there practical formulas to compute polynomials that—

    • meet the formal statement above, and
    • meet the following error bound? $$|f(x) – P_n(f)(x)| \le \epsilon(f,n,x) = O(1/n^{k/2}),$$ for every $n\ge 1$, where $P_n(f)(x)$ is an approximating degree-$n$ polynomial that can be readily rewritten to Bernstein form with coefficients in $[0,1]$; $k$ is the number of continuous derivatives; and $\epsilon(f,n,x)$ is a fully determined formula with all constants in the formula having a known exact value or upper bound.
  2. Are there other practical formulas to approximate specific factory functions with polynomials that meet the formal statement above?

  3. Are there practical formulas to compute polynomials that meet the error bound given in question 1 and can be readily rewritten to Bernstein form with non-negative Bernstein coefficients?

One example to question 1 is the function $\min(2\lambda,1-\epsilon)$ on the domain $(0, 1/2-\epsilon)$ given in Nacu and Peres 2005.

Remarks

  • A related question seeks a practical way to apply the Holtz method.
  • A related question seeks ways to approximate concave functions.
  • This question is one of numerous open questions about the Bernoulli factory problem. Answers to them will greatly improve my pages on Bernoulli factories.

References

  • Łatuszyński, K., Kosmidis, I., Papaspiliopoulos, O., Roberts, G.O., "Simulating events of unknown probabilities via reverse time martingales", arXiv:0907.4018v2 [stat.CO], 2009/2011.
  • Keane, M. S., and O'Brien, G. L., "A Bernoulli factory", ACM Transactions on Modeling and Computer Simulation 4(2), 1994.
  • Holtz, O., Nazarov, F., Peres, Y., "New Coins from Old, Smoothly", Constructive Approximation 33 (2011).
  • Nacu, Şerban, and Yuval Peres. "Fast simulation of new coins from old", The Annals of Applied Probability 15, no. 1A (2005): 93-115.
  • Micchelli, C. (1973). The saturation class and iterates of the Bernstein polynomials. Journal of Approximation Theory, 8(1), 1-18.
  • Guan, Zhong. "Iterated Bernstein polynomial approximations." arXiv preprint arXiv:0909.0684 (2009).
  • Güntürk, C. Sinan, and Weilin Li. "Approximation with one-bit polynomials in Bernstein form" arXiv preprint arXiv:2112.09183 (2021).
  • Draganov, Borislav R. "On simultaneous approximation by iterated Boolean sums of Bernstein operators." Results in Mathematics 66, no. 1 (2014): 21-41.

(**) An exception is Chebyshev interpolants, but my implementation experience shows that Chebyshev interpolants are far from being readily convertible to Bernstein form without using transcendental functions or paying attention to the difference between first vs. second kind, Chebyshev points vs. coefficients, and the interval [-1, 1] vs. [0, 1]. By contrast, other schemes (which are of greater interest to me) involve polynomials that are already in Bernstein form or that use only rational arithmetic to transform to Bernstein form (these include so-called "iterated Bernstein" polynomials and "one-bit" polynomials).

Best Answer

While the following does not fully answer my question, I make the following notes.

Polynomial Schemes

This section relates to finding polynomials for certain functions, but except for the concave/convex trick, they all converge at a rate no faster than $O(1/n)$ in general, rather than the desired $O(1/n^{k/2})$ for $C^k$ continuous functions. Other users are encouraged to add other answers to my question.

The following inequality from Nacu and Peres 2005 is useful: $$|\mathbb{E}[f(X/n)]-f(k/(2n))| \le \kappa_f(n), \tag{1}$$ where—

  • $\kappa_f(n)$ is a function that depends only on $f$ and $n$ and makes the inequality hold true for all $f$ belonging to a given class of functions (such as Lipschitz continuous or twice-differentiable functions), and
  • $X$ is a hypergeometric($2n$, $k$, $n$) random variable.

Notably, if a function $f$ is such that (1) holds true, then in general, for all $n$ that are powers of 2— $$a(k, n) = f(k/n)-\delta(n) \text{ and } b(k,n) = f(k/n)+\delta(n),$$ where $n$ is the polynomial's degree and $\delta(n)$ is a solution to the following functional equation: $$\delta(n) = \delta(n*2) + \kappa_f(n), $$or equivalently, the linear recurrence $\delta(n) = \delta(n+1) + \kappa_f(2^n)$.

For example—

  1. If $f$ is Lipschitz continuous with Lipschitz constant $C$
    • $\kappa_f(n) = C/\sqrt{2*n}$ , so
    • $\delta(n) = (1+\sqrt{2})C/\sqrt{n}$ , and
  2. If $f$ is twice differentiable and $M$ is not less than the highest value of $|f′′(x)|$ for any $x$ in [0, 1]—
    • $\kappa_f(n) = M/(4*n)$, so
    • $\delta(n) = M/(2*n)$

(Nacu and Peres 2005, Proposition 10). As experiments show, these bounds are far from tight, but they can generally be improved only by an order of magnitude.

A new result of mine exploits the results of Nacu and Peres to derive a new approximation scheme for Hölder continuous functions. For example, if $f$ is $\alpha$-Hölder continuous with Hölder constant $M$

  • $\kappa_f(n) = M(1/(7n))^{\alpha/2}$, so
  • $\delta(n) = \frac{M(2/7)^{\alpha/2}}{(2^{\alpha/2}-1)n^{\alpha/2}}$.

(For proofs and additional notes on approximation schemes I'm looking for, see my supplemental notes.)

Moreover, due to Jensen's inequality:

  • If $f$ is concave in $[0, 1]$, then $a(k,n) = f(k/n)$.
  • If $f$ is convex in $[0, 1]$, then $b(k, n) = f(k/n)$.

If $f$ equals 0 or 1 at the points 0 and/or 1, then $f$ can be transformed to bound it away from 0 and/or 1. For example, if $f(0)=0$, then it can be divided by a function that bounds $f$ from above. This case is too complicated to detail in this answer; see my supplemental notes for details.

Also, I have found the following result, which shows that any factory function admits an approximation scheme with polynomials (in a manner that solves the Bernoulli factory problem). This includes continuous functions that are not Hölder continuous. The method of proving the result goes back to Nacu and Peres (2005). This is one of several new results relating to this problem; see the appendix on my page on approximation schemes. However, again, this scheme doesn't have a finite expected running time in general.

Result: Let $f(\lambda)$ be a continuous function that maps [0, 1] to [0, 1], and let $X$ be a hypergeometric($2n$, $k$, $n$) random variable. Let $\omega(x)$ be a modulus of continuity of $f$ (a non-negative and nondecreasing function in the interval [0, 1], for which $\omega(x) = 0$, and for which $|f(x)-f(y)|\le\omega(|x-y|)$ for all $x$ in [0, 1] and all $y$ in [0, 1]). If $\omega$ is concave on [0, 1], then the expression—$$|\mathbb{E}[f(X/n)]-f(k/(2n))|, $$is bounded from above by—

  • $\omega(\sqrt{\frac{1}{8n-4}})$, for all n≥1 that are integer powers of 2,
  • $\omega(\sqrt{\frac{1}{7n}})$, for all n≥4 that are integer powers of 2, and
  • $\omega(\sqrt{\frac{1}{2n}})$, for all n≥1 that are integer powers of 2.

The only technical barrier, though, is to solve the functional equation—$$\delta(n) = \delta(2n) + \kappa(n), $$ where $\kappa(n)$ is one of the bounds proved above, such as $\omega\left(\sqrt{\frac{1}{8n-4}}\right)$.

In general, the solution to this equation is—$$\delta(2^m) = \sum_{k=m}^\infty \kappa(2^k), $$ where $n = 2^m$ and $m\ge0$ are integers, provided the sum converges.

In this case, the third bound has a trivial solution when $\omega(x)$ is of the form $cx^\alpha$, but not in general.

Now, we approximate $f$ with polynomials in Bernstein form of power-of-two degrees. These are two sequences of polynomials that converge to $f$ from above and below, such that their coefficients "increase" and "decrease" just as the polynomials themselves do. In general, for all $n\ge1$ that are integer powers of 2— $$a(k, n) = f(k/n)-\delta(n) \text{ and } b(k,n) = f(k/n)+\delta(n).$$ Thus, for the polynomials of degree $n$, $\delta(n)$ is the amount by which to shift the approximating polynomials upward and downward.

Remarks

Theorem 26 of Nacu and Peres (2005) and the proof of Keane and O'Brien (1994) give general ways to simulate continuous factory functions $f(\lambda)$ on the interval $[0, 1]$. The former is limited to functions that are bounded away from 0 and 1, while the latter is not. However, both methods don't provide formulas of the form $f(k/n) \pm \delta(n)$ that work for a whole class of factory functions (such as formulas of the kind given earlier in this answer). For this and other reasons, given below, both methods are impractical:

  • Before a given function $f$ can be simulated, the methods require computing the necessary degree of approximation (finding $k_a$ or $s_i$ for each polynomial $a$ or $i$, respectively). This work has to be repeated for each function $f$ to be simulated.
  • Computing the degree of approximation involves, among other things, checking whether the approximating polynomial is "close enough" to $f$, which can require either symbolic maximization or a numerical optimization that calculates rigorous upper and lower bounds.
  • This computation gets more and more time-intensive with increasing degree.
  • For a given $f$, it's not guaranteed whether the $k_a$'s (or $s_i$'s) will show a pattern or keep that pattern "forever" — especially since only a finite number of approximation degrees can be computed with these methods.

References

  • Nacu, Şerban, and Yuval Peres. "Fast simulation of new coins from old", The Annals of Applied Probability 15, no. 1A (2005): 93-115.
  • Flajolet, P., Pelletier, M., Soria, M., "On Buffon machines and numbers", arXiv:0906.5560 [math.PR], 2010.
  • (The "Lorentz operator".) Holtz, O., Nazarov, F., Peres, Y., "New Coins from Old, Smoothly", Constructive Approximation 33 (2011).