Solved – why does unbiasedness not imply consistency

biasconsistencyestimationunbiased-estimator

I'm reading deep learning by Ian Goodfellow et al. It introduces bias as
$$Bias(\theta)=E(\hat\theta)-\theta$$
where $\hat\theta$ and $\theta$ are the estimated parameter and the underlying real parameter, respectively.

Consistency, on the other hand, is defined by
$$\mathrm{lim}_{m\to\infty}\hat\theta_m=\theta$$
meaning that for any $\epsilon > 0$, $P(|\hat\theta_m-\theta|>\epsilon)\to0$ as $m\to\infty$

Then it says consistency implies unbiasedness but not vice versa:

Consistency ensures that the bias induced by the estimator diminishes as the number of data examples grows. However, the reverse is not true—asymptotic unbiasedness does not imply consistency. For example, consider estimating the mean parameter μ of a normal distribution N (x; μ, σ2 ), with a dataset consisting of m samples: ${x^{(1)}, . . . , x^{(m)}}$. We could use the first sample $x^{(1)}$ of the dataset as an unbiased estimator: $\hatθ = x^{(1)}$. In that case, $E(\hat θ_m) = θ$ so the estimator is unbiased no matter how many data points are seen. This, of course, implies that the estimate is asymptotically unbiased. However, this is not a consistent estimator as it is not the case that $\hatθ_m → θ$ as $m → ∞$

I'm not sure whether I've understood the above paragraph and the concepts of unbiasedness and consistency correctly, I hope someone could help me check it. Thanks in advance.

As far as I understand, consistency implies both unbiasedness and low variance and therefore, unbiasedness alone is not sufficient to imply consistency.

Best Answer

In that paragraph the authors are giving an extreme example to show how being unbiased doesn't mean that a random variable is converging on anything.

The authors are taking a random sample $X_1,\dots, X_n \sim \mathcal N(\mu,\sigma^2)$ and want to estimate $\mu$. Noting that $E(X_1) = \mu$, we could produce an unbiased estimator of $\mu$ by just ignoring all of our data except the first point $X_1$. But that's clearly a terrible idea, so unbiasedness alone is not a good criterion for evaluating an estimator. Somehow, as we get more data, we want our estimator to vary less and less from $\mu$, and that's exactly what consistency says: for any distance $\varepsilon$, the probability that $\hat \theta_n$ is more than $\varepsilon$ away from $\theta$ heads to $0$ as $n \to \infty$. And this can happen even if for any finite $n$ $\hat \theta$ is biased. An example of this is the variance estimator $\hat \sigma^2_n = \frac 1n \sum_{i=1}^n(y_i - \bar y_n)^2$ in a normal sample. This is biased but consistent.

Intuitively, a statistic is unbiased if it exactly equals the target quantity when averaged over all possible samples. But we know that the average of a bunch of things doesn't have to be anywhere near the things being averaged; this is just a fancier version of how the average of $0$ and $1$ is $1/2$, although neither $0$ nor $1$ are particularly close to $1/2$ (depending on how you measure "close").

Here's another example (although this is almost just the same example in disguise). Let $X_1 \sim \text{Bern}(\theta)$ and let $X_2 = X_3 = \dots = X_1$. Our estimator of $\theta$ will be $\hat \theta(X) = \bar X_n$. Note that $E \bar X_n = p$ so we do indeed have an unbiased estimator. But $\bar X_n = X_1 \in \{0,1\}$ so this estimator definitely isn't converging on anything close to $\theta \in (0,1)$, and for every $n$ we actually still have $\bar X_n \sim \text{Bern}(\theta)$.

Related Question