[Math] Convergence in probability vs almost sure convergence, pt II

convergence-divergenceprobability theory

This question has been already asked a few times, and, for instance, the accepted answer here gives an excellent intuitive explanation between the difference of the two, but now I'm trying to establish an intuitive link between similar explanations and the formal definitions of the respective convergences.

So, written formally, $X_n \rightarrow_{a.s.} X$ is equivalent to
$$
\forall \xi : \forall \epsilon \exists n_\epsilon : \forall n > n_\epsilon : P \{ \exists m \geq n : | X_m – X | > \xi \} < \epsilon.
$$
On the other hand, $X_n \rightarrow_{p} X$ is equivalent to
$$
\forall \xi : \forall \epsilon \exists n_\epsilon : \forall n > n_\epsilon : P \{| X_n – X | > \xi \} < \epsilon.
$$
Factoring out the common predicates, it boils down to the difference between $P \{ \exists m \geq n : | X_m – X | > \xi \} < \epsilon$ and $P \{| X_n – X | > \xi \} < \epsilon$.

In other words, if I get it right, it boils down to whether the probability of a deviation too big of any (for the almost sure convergence) or each individual (for the convergence in probability) of the members of the sequence starting at a sufficiently large $n$ will be small enough.

It's immediately obvious why almost sure convergence includes convergence in probability. But what is not obvious to me is why the converse isn't true: why each individual member of the sequence being unlikely to get far from the limiting random variable doesn't imply that any member (starting perhaps at a much further point) won't be far too?

I know some counter-examples of random variables converging in probability but not almost surely, but they don't really help to grok what exactly breaks when trying to make the above implication. So, perhaps this is more of real analysis or logical question.

Best Answer

Okay so $\Omega=[0,1]$ and our probability measure is just the Lebesgue measure on the interval. Consider $X_k=\mathbb{1}_{[0,\frac{1}{k}]}$ now these converge almost surely and in probability. As it converges pointwise except for the 0. And $P(\{0\})=0$. So let's make those a bit more clever.

Now the issue is to write down a formula for this, but what happens if you move those intervals over the [0,1] interval before making them smaller? You could properly define this recursively. But it is probably easier to see from example:

$$1_{[0,1]},1_{[0,\frac12]},1_{[\frac12,1]},1_{[0,\frac13]},1_{[\frac23,\frac13]},...$$

Now the interval gets smaller or stays equal with every step. Which means that $\forall\varepsilon\ P(X_k>\varepsilon)$ converges to zero which is convergence in probability. But you can not find a zero set which is the only affected part. Because for every natural number you have a positive interval size which gets shifted over every part of the [0,1] intervall. Which means that actually the entire [0,1] intervall fulfills: $\exists m>k:X_m(x)>\varepsilon$ thus $P(\exists m>k:X_m>\varepsilon)=P([0,1])=1 \ \forall\varepsilon>0$

Note the difference between these two sets: $$\{x\in[0,1]:X_k(x)>\varepsilon\}$$ which is a smaller and smaller set for increasing k. And: $$\{x\in[0,1]:\exists m>k:X_m(x)>\varepsilon\} =[0,1]$$

Now there is actually a theorem which states that every sequence of random variables which converges in probability has a subsequence which converges almost surely. And you can easily construct such a sequence in our example. It is the sequence we started with.

So the difference is, that in one case you just say that the probability that it doesn't converge goes to zero. In the other case you can actually find a set which has probability zero, which is the only part in which it doesn't converge pointwise. This is a lot stronger, as the bad stuff is actually contained in a sense.

Related Question