The complement of $\limsup \{X_n > \epsilon\}$ is $\liminf \{X_n \leqslant \epsilon\}$, which contains $\{\limsup X_n\lt\epsilon\}$ and is included in $\{\limsup X_n\leqslant\epsilon\}$.
Equivalently,
$$
\{\limsup X_n \gt \epsilon\}\subseteq\limsup \{X_n > \epsilon\}\subseteq\{\limsup X_n \geqslant \epsilon\}.
$$
An example to show that the first inclusion may be strict is $\{\forall n,X_n=\epsilon+1/n\}$. An example to show that the second inclusion may be strict is $\{\forall n,X_n=\epsilon-1/n\}$.
Yes these are rather subtle interplays between the limsup of sets and the limsup of random variables, and between strict and large inequalities.
Okay so $\Omega=[0,1]$ and our probability measure is just the Lebesgue measure on the interval. Consider $X_k=\mathbb{1}_{[0,\frac{1}{k}]}$ now these converge almost surely and in probability. As it converges pointwise except for the 0. And $P(\{0\})=0$. So let's make those a bit more clever.
Now the issue is to write down a formula for this, but what happens if you move those intervals over the [0,1] interval before making them smaller? You could properly define this recursively. But it is probably easier to see from example:
$$1_{[0,1]},1_{[0,\frac12]},1_{[\frac12,1]},1_{[0,\frac13]},1_{[\frac23,\frac13]},...$$
Now the interval gets smaller or stays equal with every step. Which means that $\forall\varepsilon\ P(X_k>\varepsilon)$ converges to zero which is convergence in probability. But you can not find a zero set which is the only affected part. Because for every natural number you have a positive interval size which gets shifted over every part of the [0,1] intervall. Which means that actually the entire [0,1] intervall fulfills: $\exists m>k:X_m(x)>\varepsilon$ thus $P(\exists m>k:X_m>\varepsilon)=P([0,1])=1 \ \forall\varepsilon>0$
Note the difference between these two sets:
$$\{x\in[0,1]:X_k(x)>\varepsilon\}$$
which is a smaller and smaller set for increasing k. And:
$$\{x\in[0,1]:\exists m>k:X_m(x)>\varepsilon\} =[0,1]$$
Now there is actually a theorem which states that every sequence of random variables which converges in probability has a subsequence which converges almost surely. And you can easily construct such a sequence in our example. It is the sequence we started with.
So the difference is, that in one case you just say that the probability that it doesn't converge goes to zero. In the other case you can actually find a set which has probability zero, which is the only part in which it doesn't converge pointwise. This is a lot stronger, as the bad stuff is actually contained in a sense.
Best Answer
Equivalence of certain conditions means that they imply each other. The direction "almost sure convergence $\Rightarrow$ convergence in probability" is always true, so only the other direction has to be proven.
In a discrete probability space, almost sure convergence is pointwise convergence in all singleton sets of non-zero measure, because every subset of a discrete probability space is a countable union of singleton sets. So let $\omega$ be an element of the probability space with $P(\{\omega\})=p>0$. If $X(\omega)-X_n(\omega)$ didn't converge to $0$, then there is an $\varepsilon>0$ for which there are infinitely many $n\in\mathbb N$ such that $\vert X(\omega)-X_n(\omega)\vert>\varepsilon$. What does this mean for the probability $P(\vert X-X_n\vert>\varepsilon)$, and thus for convergence in probability?