UMVUE – Finding the UMVUE for a Function of a Bernoulli Parameter

bernoulli-distributionfactorisation-theoremrao-blackwellumvueunbiased-estimator

Given $m$ i.i.d. Bernoulli( $\theta$ ) r.v.s $X_{1}, X_{2}, \ldots, X_{m},$ I'm interested in finding the UMVUE of $(1-\theta)^{1/k}$, when $k$ is a positive integer. .

I know $\sum X_{i}$ is a sufficient statistic by the Factorization Theorem, but I'm having trouble proceeding from there. If I can find an unbiased function of the sufficient statistic the problem is solved by the Rao-Blackwell theorem.

Best Answer

Except when $k=1$, given a finite sequence of i.i.d. Bernoulli $\mathcal B(θ)$ random variables $X_1,X_2,\ldots,X_m$, there exists no unbiased estimator of $(1−θ)^{1/k}$, when $k$ is a positive integer.

The reason for this impossibility is that only polynomials in $\theta$ of degree at most $m$ can be unbiasedly estimated. Indeed, since $Y_m=m\bar{X}_m$ is a sufficient statistic, we can assume wlog that an unbiased estimator is a function of $Y_m\sim\mathcal Bin(m,p)$, $\delta(Y_m)$, with expectation $$\sum_{i=0}^m \delta(i) {m \choose i} \theta^i(1-\theta)^{m-i}$$ which is therefore a polynomial in $\theta$ of degree at most $m$.

See Halmos (1946) for a general theory of unbiased estimation that points out the rarity of unbiasedly estimable functions.

However, when changing the perspective, there exists an unbiased estimator of $\theta^a$, $a\in(0,1)$, when considering instead an infinite sequence of i.i.d. Bernoulli $\mathcal B(θ)$ random variables $X_1,X_2,\ldots$ This is a consequence of the notion of a Bernoulli factory.

Given a known function $f:S\mapsto (0,1)$, we consider the problem of using independent tosses of a coin with probability of heads $\theta$ (where $\theta\in S$ is unknown) to simulate a coin with probability of heads $f(\theta)$. (Nacu & Peres, 2005)

Mendo (2018) and Thomas and Blanchet show that there exists a Bernoulli factory solution for $θ^a$, $a\in (0,1)$, with constructive arguments. The first author uses the power series decomposition of $f(\theta)$ $$f(\theta)=1-\sum_{k=1}^\infty c_k(1-\theta)^k\qquad c_k\ge 0,\,\sum_{k=1}^\infty c_k=1$$ to construct the sequence$$d_k=\dfrac{c_k}{1-\sum_{\kappa=1}^{k-1}c_\kappa}$$ and the associated algorithm

  1. Set i=1.
  2. Take one Bernoulli $\mathcal B(θ)$ input Xi.
  3. Produce Ui Uniform on (0,1). Let Vi = 1 if Ui < di or Vi = 0 otherwise.
  4. If Vi or Xi are 1, output Y = Xi and finish. Else increase i by 1 and go back to step 2.

    For instance, when $f(\theta) =\sqrt\theta$ the coefficients $c_k$ are $$c_k=\frac{1}{2^{2k−1}k}{2k-2 \choose k−1}$$ Here is an R code illustrating the validity of the method:

    ck=exp(lchoose(n=2*(k<-1:1e5)-2,k=k-1)-log(k)-{2*k-1}*log(2)) dk=ck/(1-c(0,cumsum(ck[-1e5]))) be <- function(p){ i=1 while((xi<-runif(1)>p)&(runif(1)>dk[i])) i=i+1 1-xi} for(t in 1:1e5)ck[t]=be(p)

and the empirical verification that the simulated outcomes are indeed Bernoulli $\mathcal B(\sqrt{\theta})$:

enter image description here

As an aside estimating $\theta^{1/k}$ or $(1-\theta)^{1/k}$ has a practical appeal when considering Dorfman's group blood testing or pooling where blood samples of $k$ individuals are mixed together to speed up the confirmation they all are free from a disease.