Solved – Unbiased Estimator for the CDF of a Normal Distribution

mathematical-statisticsnormal distributionself-studyumvueunbiased-estimator

Problem Statement

Let $X_1, X_2, …, X_n$ be i.i.d. random variables from a normal distribution with mean $\mu$ and variance $1$. Find an unbiased estimator for $\tau(\mu):=P(X_1>0)$.

Attempt at a Solution

We can manipulate $\tau(\mu)$ to obtain that it is the cdf $\Phi$ of a standard normal by noting that
\begin{aligned}
P(X_1>0)&=P\left(\frac{X_1-\mu}{\sigma}>-\frac{\mu}{\sigma}\right)\\
&=1-\Phi(-\mu/\sigma) \\
&=\Phi(\mu/\sigma) \\
&=\Phi(\mu)
\end{aligned}
since $\sigma = 1$. We therefore want a statistic $S\big(\vec X\big)$ so that $E[S] = \Phi(\mu)$. Because $\overline X$ is the UMVUE for $\mu$, I am inclined to search for a statistic that is a function of $\overline X$; i.e., an $S$ such that
$$
\int_{-\infty}^\infty S(t)\frac{n}{\sqrt{2\pi}}e^{-n^2(t-\mu)^2/2}dt = \int_{-\infty}^\mu\frac{1}{\sqrt{2\pi}}e^{-t^2/2}dt
$$
where the LHS is $E\big[S\big(\overline X\big)\big]$ in terms of the pdf for $\overline X$, which is a normal with mean $\mu$ and variance $\sigma^2/n=1/n$, and the RHS is the integral of the pdf of a standard normal. From here, I have considered:

  • A change of variables to get the limits of integration to agree.
  • Choosing $S=\Phi\big(\overline X \big)$, although there isn't any reason to expect this to be unbiased for $\Phi(\mu)$ just because $\overline X$ is unbiased for $\mu$.
  • Expressing the integral on the LHS in different forms; for example, in terms of the joint pdf of the $X_i$.

The ad-hoc nature of my approach here is representative of my endeavors to find unbiased estimators in general. I hold out hope for a more systematic approach to doing so than blindly manipulating half-baked guesses informed by complete and sufficient statistics for exponential family distributions.

Best Answer

As a comment suggested, an unbiased estimator is (one minus) the empirical distribution function

$$\hat P(X_1 > 0) = 1-\hat F_X(0) = 1-\frac 1n \sum_{i=1}^n I\{x_i \leq 0\}$$

where $I\{\}$ is the indicator function, because

$$E[\hat P(X_1 > 0)]=1- E[\hat F_X(0)] = 1-\frac 1n \sum_{i=1}^n E[I\{x_i \leq 0\}] $$

$$= 1-\frac 1nn P(X_1\leq0) = P(X_1 > 0)$$

To the degree that we are estimating a probability with respect to a known threshold (in our case $0$), it doesn't matter whether we know the parameters of the distribution ($\mu$) or not.

Related Question