Given a coin that produces the probability $\lambda$, what is a constructive way to produce $f(\lambda)$, where $f$ is continuous non-Hölder

approximation-theoryholder-spacespolynomialsprobabilitysimulation

Given a coin of "bias" $\lambda$, sample the probability $f(\lambda)$. This is the Bernoulli factory problem, and it can be solved only for certain functions $f$. (For example, flipping the coin twice and taking heads only if exactly one coin shows heads, we can simulate the probability $\lambda(1-\lambda)$.)

Usually, the Bernoulli factory problem can be solved for $f$ if and only if—

  • $f$ is continuous in $[0, 1]$, and
  • $f$ is identically $0$, identically $1$, or polynomially bounded away from $0$ and $1$ (roughly speaking, the function takes on only values in $ [0, 1]$ and its graph doesn't touch $0$ or $1$ except at the points $0$ and/or $1$)

(Keane and O'Brien 1994).

This question is a continuation of another question of mine, but this time we focus on the case when $f(\lambda)$ is a continuous non-Hölder function, which roughly means a continuous function with an exponentially steep slope (steeper than any $n$th-root).

The question just linked to seeks ways to compute polynomials that converge from above and below to $f$ in a manner that solves the Bernoulli factory problem for $f$. These polynomials form an approximation scheme for $f$. See that question for a formal statement of such approximation schemes.

There are two algorithms (one by Thomas and Blanchet $2012$, another by Łatuszyński et al.) to simulate a function $f$ via an approximation scheme. Roughly speaking, the algorithms work as follows:

  1. Generate U, a uniform random number in $[0, 1]$.
  2. Flip the input coin (with "bias" $\lambda$), then build an upper and lower bound for $f(\lambda)$, based on the outcomes of the flips so far. In this case, these bounds come from two degree-$n$ polynomials that approach $f$ as $n$ gets large, where $n$ is the number of coin flips so far in the algorithm.
  3. If U is less than or equal to the lower bound, return 1. If U is greater than the upper bound, return $0$. Otherwise, go to step $2$.

The result of the algorithm is $1$ with probability exactly equal to $f(\lambda)$, or $0$ otherwise.

In general, the rate at which a function can be simulated by these algorithms depends on the smoothness of $f$. It roughly corresponds to the rate of convergence of an approximation scheme for $f$. I list approximation schemes for different kinds of functions $f$, which work with the two algorithms described above.

For example, roughly speaking, a $C^0$ continuous function $f$ can be simulated at the rate $O(n^{\alpha})$ only if $f$ is $\alpha$-Hölder continuous (Holtz et al. 2011). This seems to imply that a function not meeting a Hölder condition (a non-Hölder function) can achieve, at best, a simulation rate of $O(1)$, so that the polynomials won't necessarily converge uniformly for all $\lambda \in [0, 1]$ and for all non-Hölder functions, so that the algorithms above won't terminate in all cases.

And this suggests to me that we need to add the following extra condition to the Bernoulli factory problem, since it appears that no approximation scheme exists for non-Hölder functions:

  • $f$ is $\alpha$-Hölder continuous in its domain for some $\alpha > 0$.

Is this true? Is there really no algorithm to sample the probability $f(\lambda)$ for a non-Hölder function $f$, given sample access to the probability $\lambda$?

This suggests that a different approach may be needed for Hölder functions. Thus: Is there a constructive method to approximate non-Hölder functions with polynomials for the Bernoulli factory problem (in the form of a formula to compute their coefficients), so that we can sample the probability $f(\lambda)$ for a non-Hölder function $f$, given sample access to the probability $\lambda$?

Note: An example of a non-Hölder function is the following, with a "cusp" at $0$:

  • $1/10$ if $\lambda = 0$, and
  • $-1/(2*\ln(\lambda/2))+1/10$ otherwise.

Another example is the following, where the cusp is at $3/10$:

  • $1/10$ if $\lambda = 3/10$, and
  • $-1/(2*\ln(\frac{|\lambda-3/10|}{2}))+1/10$ otherwise.

A third example is the following monotone function:

  • $3/10$ if $\lambda = 3/10$,
  • $1/(2*\ln(\frac{3/10-\lambda}{2}))+3/10$ if $\lambda < 3/10$, and
  • $-1/(2*\ln(\frac{\lambda-3/10}{2}))+3/10$ otherwise.

Roughly speaking:

  • A non-Hölder function is continuous but has an exponentially steep slope.
  • If a continuous function has no vertical slope, the function is 1-Hölder continuous.

REFERENCES:

  • Thomas, A.C., Blanchet, J., "A Practical Implementation of the Bernoulli Factory", arXiv:1106.2508v3 [stat.AP], 2012.
  • Łatuszyński, K., Kosmidis, I., Papaspiliopoulos, O., Roberts, G.O., "Simulating events of unknown probabilities via reverse time martingales", arXiv:0907.4018v2 [stat.CO], $2009/2011$.
  • Keane, M. S., and O'Brien, G. L., "A Bernoulli factory", ACM Transactions on Modeling and Computer Simulation 4(2), 1994.
  • Holtz, O., Nazarov, F., Peres, Y., "New Coins from Old, Smoothly", Constructive Approximation $33$ ($2011$).

Update:

I have become aware of so-called Dini-continuous functions, which are a superset of Hölder continuous functions.

Essentially, a Dini-continuous function has a slope no "steeper" than that of $\phi(\lambda)$ for some increasing non-negative function $\phi$ with a finite integral of $\phi(\lambda)/\lambda$ over $[0, 1]$.

However, I haven't found any paper yet on the rate at which polynomials can approximate a Dini-continuous (but not Hölder-continuous) function (let alone in a manner that solves the Bernoulli factory problem).

Moreover, the following function, adapted from this question, is apparently not even Dini continuous:

  • $1/10 + (1/(1+|\ln(\lambda)|))/2$ if $\lambda>0$, and
  • $1/10$ otherwise.

Best Answer

I have worked on this and found the following result, which shows that any continuous function meeting the two conditions in my question admits an approximation scheme with polynomials (in a manner that solves the Bernoulli factory problem). This includes continuous functions that are not Hölder continuous. The method of proving the result goes back to Nacu and Peres (2005).

Result: Let $f(\lambda)$ be a continuous function that maps [0, 1] to [0, 1], and let $X$ be a hypergeometric($2n$, $k$, $n$) random variable. Let $\omega(x)$ be a modulus of continuity of $f$ (a non-negative and nondecreasing function in the interval [0, 1], for which $|f(x)-f(y)|\le\omega(|x-y|)$ for all $x$ in [0, 1] and all $y$ in [0, 1]). If $\omega$ is concave on [0, 1], then the expression—$$|\mathbb{E}[f(X/n)]-f(k/(2n))|, $$is bounded from above by—

  • $\omega(\sqrt{\frac{1}{8n-4}})$, _for all n≥1 that are integer powers of 2, _
  • $\omega(\sqrt{\frac{1}{7n}})$, for all n≥4 that are integer powers of 2, and
  • $\omega(\sqrt{\frac{1}{2n}})$, for all n≥1 that are integer powers of 2.

Proof: ω is assumed to be non-negative because absolute values are non-negative. To prove the first and second bounds: abs(E[f(X/n)] − f(k/(2 * n))) ≤ E[abs(f(X/n) − f(k/(2 * n))] ≤ E[ω(abs(X/nk/(2 * n))] ≤ ω(E[abs(X/nk/(2 * n))]) (by Jensen's inequality and because ω is concave) ≤ ω(sqrt(E[abs(X/nk/(2 * n))]2)) = ω(sqrt(Var[X/n])) = ω(sqrt((k*(2 * nk)/(4*(2 * n −1)*n2)))) ≤ ω(sqrt((n2/(4*(2 * n −1)*n2)))) = ω(sqrt((1/(8*n −4)))) = ρ, and for all n ≥4 that are integer powers of 2, ρω(sqrt(1/(7*n))). To prove the third bound: abs(E[f(X/n)] − f(k/(2 * n))) ≤ ω(sqrt(Var[X/n])) ≤ ω(sqrt(1/(2*n))). □

The only technical barrier, though, is to solve the functional equation—$$\delta(n) = \delta(2n) + \kappa(n), $$ where $\kappa(n)$ is one of the bounds proved above, such as $\omega\left(\sqrt{\frac{1}{8n-4}}\right)$.

In general, the solution to this equation is—$$\delta(2^m) = \sum_{k=m}^\infty \kappa(2^k), $$ where $n = 2^m$ and $m\ge0$ are integers, provided the sum converges.

In this case, the third bound has a trivial solution when $\omega(x)$ is of the form $cx^\alpha$, but not in general.

Now, we approximate $f$ with polynomials in Bernstein form of power-of-two degrees. These are two sequences of polynomials that converge to $f$ from above and below, such that their coefficients "increase" and "decrease" just as the polynomials themselves do. In general, for all $n\ge1$ that are integer powers of 2:

  • The $k$th Bernstein coefficient for the upper polynomial of $n$th degree is $f(k/n) + \delta(n)$.
  • The $k$th Bernstein coefficient for the lower polynomial of $n$th degree is $f(k/n) - \delta(n)$.

Thus, for the polynomials of degree $n$, $\delta(n)$ is the amount by which to shift the approximating polynomials upward and downward.


Remark:

Theorem 26 of Nacu and Peres (2005) and the proof of Keane and O'Brien (1994) give general ways to simulate continuous factory functions $f(\lambda)$ on the interval $[0, 1]$. The former is limited to functions that are bounded away from 0 and 1, while the latter is not. However, both methods don't provide formulas of the form $f(k/n) \pm \delta(n)$ that work for a whole class of factory functions (such as formulas of the kind given earlier in this answer). For this and other reasons, given below, both methods are impractical:

  • Before a given function $f$ can be simulated, the methods require computing the necessary degree of approximation (finding $k_a$ or $s_i$ for each polynomial $a$ or $i$, respectively). This work has to be repeated for each function $f$ to be simulated.
  • Computing the degree of approximation involves, among other things, checking whether the approximating polynomial is "close enough" to $f$, which can require a symbolic maximization.
  • This computation gets more and more time-intensive with increasing degree.
  • For a given $f$, it's not guaranteed whether the $k_a$'s (or $s_i$'s) will show a pattern or keep that pattern "forever" — especially since only a finite number of approximation degrees can be computed with these methods.

REFERENCES:

Related Question