Say an estimator converges with probability one and at the same time its variance goes to zero in the limit. How is it different than an estimator that converges with probability one but its variance does not go to zero? Does that achieve sure convergence? I am wondering what difference it makes.
Solved – Almost sure convergence and limiting variance goes to zero
convergencevariance
Related Solutions
Unfortunately, the quoted statement in the question is a muddled version of the one I originally intended. Thank you for catching this. I've also updated the statement in the original question. A counterexample to the former is given at the end of this answer.
Here is the intended statement:
Lemma: Let $X_1,X_2,\ldots$ be a sequence of zero-mean normal random variables defined on the same space with variances $\sigma_n^2$. Then, $X_n \to X_\infty$ in probability if and only if $X_n \xrightarrow{\,L_2\,} X_\infty$, in which case $X_\infty \sim \mathcal N(0,\sigma^2)$ where $\sigma^2 = \lim_{n\to\infty} \sigma_n^2$.
Remark: The main points here are that (a) we can "upgrade" convergence in probability to $L_2$ convergence in the case of sequences of normals, (b) we are guaranteed that the distribution of the limit is normal (which is not otherwise obvious) and (c) we get both of the above without specifying anything about the joint distributions of the elements in the sequence.
Proof (sketch): One direction is easy: Convergence in $L_2$ always implies convergence in probability. To prove to the other direction: If $X_n \to X$ in probability, then $$\varphi_n(t) = \mathbb E e^{it X_n} \to \mathbb E e^{it X_{\infty}}$$ by dominated convergence. But, $\varphi_n(t) = e^{t^2 \sigma_n^2 / 2}$, and $e^{t^2 \sigma_n^2 / 2}$ converges for each $t$ as $n \to \infty$ if and only if $\sigma_n^2 \to \sigma^2$. This is enough to imply that $\sup_n \sigma_n^2 < \infty$, and hence the collection $\{X_n^2\}$ is uniformly integrable. Thus, $X_n \xrightarrow{\,L_2\,} X_\infty$. This further shows the limit $X_\infty$ must be normal since then $\mathbb E e^{it X_\infty} = e^{t^2 \sigma^2 / 2}$, which is the characteristic function of a normally distribution random variable.
Notes
The convergence of the sequence $\sigma_n^2$ and the fact that the limiting distribution $X_\infty$ is normally distributed are part of the conclusion. By the same exact argument, we can also replace $L_2$ convergence with the more general $L_p$ convergence by recognizing that the variance determines the distribution in this case and all moments are finite, so $\{X_n^p\}$ is also uniformly integrable.
From this it is clear that we have the following weaker result on convergence in distribution, which is well-known and given as an exercise in some probability textbooks.
Lemma: Let $X_1,X_2,\ldots$ be a sequence of normal random variables. Then, $X_n \to X_\infty$ in distribution if and only if $\mu_n \to \mu$ and $\sigma_n^2 \to \sigma^2$, in which case $X_\infty \sim \mathcal N(\mu,\sigma^2)$.
A nice application of the second lemma is to consider the marginal distribution of the Riemann integral of Brownian motion, $$ I_t = \int_0^t B_s \, \mathrm{d} s \> . $$ By considering the Riemann sums and using the second lemma, we see that $I_t \sim \mathcal N(0, t^3/3)$.
A counterexample to the quoted statement in the question can be found by considering $X_\infty \sim \mathcal N(0,1)$ and $X_n = (-1)^n X_\infty$. Here, $\sigma_n^2 = 1$ for all $n$ so $\sigma_n^2 \to 1$, but $X_n$ does not converge to $X_\infty$ in probability or $L_2$.
I won't give a very satisfactory answer to your question because it seems to me to be a little bit too open, but let me try to shed some light on why this question is a hard one.
I think you are struggling with the fact that the conventional topologies we use on probability distributions and random variables are bad. I've written a bigger piece about this on my blog but let me try to summarize: you can converge in the weak (and the total-variation) sense while violating commonsensical assumptions about what convergence means.
For example, you can converge in weak topology towards a constant while having variance = 1 (which is exactly what your $Z_n$ sequence is doing). There is then a limit distribution (in the weak topology) that is this monstruous random variable which is most of the time equal to 0 but infinitesimally rarely equal to infinity.
I personally take this to mean that the weak topology (and the total-variation topology too) is a poor notion of convergence that should be discarded. Most of the convergences we actually use are stronger than that. However, I don't really know what should we use instead of the weak topology sooo ...
If you really want to find an essential difference between $\hat \theta= \bar X+Z_n$ and $\tilde \theta=\bar X$, here is my take: both estimators are equivalent for the [0,1]-loss (when the size of your mistake doesn't matter). However, $\tilde \theta $ is much better if the size of your mistakes matter, because $\hat \theta$ sometimes fails catastrophically.
Best Answer
Convergence almost surely and convergence in distribution are not the same thing.
Take a simple example where you have i.i.d. mean zero random variables $Y_i$, $i= 1, 2, \cdots$ for which the law of iterated logarithm holds. Then
$$ \lim \sup_{n \rightarrow \infty} \frac{\frac{1}{\sqrt{n}}\sum_1^n Y_i}{\sqrt{\log\log n}} = \sqrt{2}, \; a.s. $$
In particular, $\frac{1}{\sqrt{n}}\sum_1^n Y_i$ diverges almost surely, but convergence in distribution. A sequence of random variables can diverge for a fixed sample path but converge in distribution across all sample paths.