Mean – Why Isn’t This Estimator Unbiased?

estimatorsexpected valuemeanunbiased-estimator

Suppose we have a IID sample $X_1, X_2, \cdots, X_n$ with each $X_i$ distributed as $\mathcal{N}(\mu, \sigma^2)$. Now suppose we construct (a rather peculiar) estimator for the mean $\mu$: we only choose values from the sample that are greater than a pre-decided value, say $1$, and then take the sample average of only those values:

$$\hat{\mu}=\frac{1}{n_1}\sum\limits_{X_i > 1} X_i$$

Here $n_1$ is the number of values that are greater than $1$. Now I was expecting this estimator to be highly biased. However, we have:

$$\mathbb{E}(\hat{\mu}) = \frac{1}{n_1}\sum\limits_{X_i>1}\mathbb{E}(X_i)=\frac{1}{n_1}n_1\mu=\mu$$

This simply would mean that the estimator is unbiased! But obviously if I do this by generating many numbers and choosing only those which are greater than $1$, I will never get the estimator value less than $1$ and so the estimator has to be biased. What am I missing?

Best Answer

Given that $n_1$ is a random variable (as pointed out already in the comments), the expected value can be computed as $E(\hat\mu)=E_{n_1}[E_{\hat \mu}(\hat\mu|n_1)]$. For the inner expectation, note that one can't just write $E_{\hat \mu}(\hat\mu|n_1)=\frac{1}{n_1}\sum_{X_i>1}E(X_i),$ because the expected value cannot depend on specific values of certain $X_i$, as would be required for the sum. So $$E_{\hat \mu}(\hat\mu|n_1)=\frac{1}{n_1}E\left[\sum_{X_i>1} X_i|n_1\right].$$ For given $n_1$, we can write, with appropriate renumbering of indexes, $\sum_{X_i>1} X_i=\sum_{j=1}^{n_1} X_j^*$, where $X_j^*$ are random variables distributed according to a truncated normal distribution between $a=1$ and $b=\infty$. Let $E_{\mu,\sigma^2,a,b}X$ denote the expectation of such a truncated normal. For $a=1, b=\infty,$ $$E_{\mu,\sigma^2,1,\infty}X=\mu+\frac{\varphi\left(\frac{1-\mu}{\sigma}\right)}{1-\Phi\left(\frac{1-\mu}{\sigma}\right)}\sigma=t>\mu,$$ see https://en.wikipedia.org/wiki/Truncated_normal_distribution . Conditioning on $n_1$, we have $$E_{\hat \mu}(\hat\mu|n_1)=\frac{1}{n_1}\sum_{j=1}^{n_1} E(X_j^*)=\frac{1}{n_1}n_1 E_{\mu,\sigma^2,1,\infty}(X)=t>\mu.$$ This does not depend on $n_1$ (unless $n_1=0$, in which case the sum is empty and $E_{\hat \mu}(\hat\mu|n_1=0)=0$), so ultimately $$E(\hat \mu)=P\{n_1>0\}t.$$ This is $>\mu$ (bias!) if $\mu\le 0$, and also if $P\{n_1=0\}$ is small enough that $P\{n_1>0\}t>\mu$, which should hold unless $n$ is very small (potentially resulting in a large $P\{n_1=0\}$, the value of which is given in Xi'an's solution).

PS: I corrected this seeing Xi'an's solution, who got a thing right that I had forgotten about. That solution is perfectly right as far as I can see, however my different way of getting there may also help.

PPS: I take $\hat \mu=0$ in case $n_1=0$, which isn't entirely clear in the question.

Related Question