A theorem by Markov states that if a sequence of random variables $X_1, X_2, \ldots$ with finite variances fulfills one of conditions:
- $\lim_{n \to \infty} \frac{1}{n^2} \mathrm{Var} \sum_{i = 1}^n X_n = 0$;
- $X_1, X_2, \ldots$ are independent and $\lim_{n \to \infty}\frac{1}{n^2}\sum_{i = 1}^n \mathrm{Var} X_i = 0$;
then the sequence $Y_n = \frac{1}{n}\sum_{i=1}^n (X_i - \mathsf{E} X_i)$ converges for $n \to \infty$ to $0$ in probability.
In addition, if random variables $X_1, X_2, \ldots$ are identically distributed, have finite variance and are uncorrelated (instead of independent), then the proof of the weak law of large numbers using Chebyshev's inequality still holds.
EDIT: Corrected the first condition, thanks to @Michael.
I think a lot of the confusion in settings such as this can occur because the notation often used in statistics does not remind the reader than $S_n$ is not a number. It's a function. We have a probability space $\Omega$, and each $S_n$ is a function $S_n:\Omega\to\mathbb{R}$. For any $\omega\in \Omega$, $S_n(\omega)$ is a sequence of numbers. For different $\omega$, we may get different sequences which may have different behaviors.
Convergence in probability does not say anything about any particular $\omega$. It says something about $\Omega$ overall, roughly that $\mathbb{P}(\{\omega\in\Omega:|S_n(\omega)-\mu|>\epsilon\})$ is small for large $n$ (and, more precisely, thatn for any $\epsilon$, this sequence approaches zero as $n\to\infty$). This is what is meant when it says that $S_n$ is likely to be near $\mu$ for large $n$. But I think that the statement about $n^*$ is (without mentioning it) transitioning to talking what's happening at a specific $\omega$. We can say that fo r $n^*$, $S_{n^*}$ is likely to be near $\mu$ but if $S_{n^*}(\omega)$ is near $\mu$ for some particular $\omega$, it is not necessarily the case that $S_n(\omega)$ is near $\mu$ for this same $\omega$ and later $n^*$. The strong law precludes this because if $\omega\in \Omega$ is such that $\lim_n S_n(\omega)=\mu$, then for any $\epsilon>0$, $\{n:|S_n(\omega)-\mu|>\epsilon\}$ is finite. Note that both of these last two conditions are about a single $\omega$ and are about a sequence of numbers, $S_n(\omega)$, not random variables (functions) $S_n$. Therefore if $\omega$ is such thbat $\{n:|S_n(\omega)-\mu|>\epsilon\}$ is finite, then $S_n(\omega)$ does not converge to $\mu$, and the set of such $\omega$ has probability zero.
Let's imagine what could be happening to get convergence in probability but not convergence a.s. Let $\Omega=[0,1)$ endowed with the usual Lebesgue measure $\mathbb{P}$. Let $F_1^1:\Omega\to \{0,1\}$ be $1$ on the whole interval $[0,1)$. Let $F^2_1,F^2_2:\Omega\to\{0,1\}$ be such that $F^2_1$ is $1$ on the first half of $[0,1)$ and $0$ on the second half. Let $F^2_2$ be $1$ on the second half of $[0,1)$ and $0$ on the first. More generally, let $F^n_1,\ldots,F^n_n:[0,1)\to\{0,1\}$ be such that $F^n_k$ is $1$ on $[(k-1)/n,k/n)$ and zero on the rest of $[0,1)$.
Note that for $0<\epsilon<1$, $$\mathbb{P}(|F^n_k-0|>\epsilon)=1/n,$$ because the set where $|F^n_k-0|>\epsilon$ is $[(k-1)/n,k/n)$. So if $X_1,X_2,\ldots$ is an enumeration of $F^1_1,F^2_1,F^2_2,F^3_1,F^3_2,F^3_3,\ldots$, $X_m$ converges to $0$ in probability. But for any particular point $\omega\in[0,1)$, $X_m(\omega)$ does not converge to $0$. This is because $X_m(\omega)$ will be $1$ for one of the functions $F^1_1$, one of the functions $F^2_1,F^2_2$, one of the functions $F^3_1,F^3_2,F^3_3$, etc. So these "bad sets" $A_m=\{\omega\in\Omega:|X_m-0|>\epsilon\}$ have probabilities going to zero, but they bounce around enough so that each point occurs in infinitely many of the $A_m$ (but it will take longer and longer between $m$ in which a particular $\omega$ appears). More specifically, the $A_m$ seequence is $A_1=[0,1)$, $A_2=[0,1/2)$, $A_3=[1/2,1)$, $A_4=[0,1/3)$, $A_5=[1/3,2/3)$, $A_6=[2/3,1)$, etc. So $\mathbb{P}(\{\omega\in\Omega:\lim_n X_n(\omega)=0\})=0$, not $1$.
If there were a sequence of iid random variables $Y_n$ such that $X_n=(Y_1+\ldots+Y_n)/n$, then this sequence would satisfy the weak law of large numbers but not the strong law, because $X_n$ converges to zero in probability but not a.s. However, there is no such sequence of $Y_i$ (indeed, one way to prove that there can be no such $Y_i$ is exactly because if it did exist, then the sample means $X_n$ would converge a.s. by SLLN).
Best Answer
If $S_n/n \to c\: $ a.s. where $c$ is finite, then $S_{n+1}/n \to c \:$ a.s., so $X_{n+1}/n \to 0$ a.s.
This implies that $|X_{n+1}|/n \ge 1$ finitely often a.s. By the second Borel Cantelli Lemma, $$\infty> \sum_{n \ge 0} P(|X_{n+1}| \ge n)=\sum_{n \ge 0} P(|X_{1}|\ge n) =E\Bigl(\sum_{n \ge 0} {\bf 1}_{|X_{1}|\ge n}\Bigr) \ge E|X_1|\,.$$
On the other hand, if the almost sure limit $c=\lim S_n/n$ exists, but is infinite, then it is possible that $E(X_1)$ is undefined, i.e., we can have $E(|X_1 \wedge 0|)=E(X_1 \vee 0)= \infty$.
Example:: Suppose that for all $x \ge 0$, we have $$P(X_1 \ge x)=\frac1{2\ln(e+x)}$$ and $$P(X_1 \le -x)=\frac1{2(1+x)} \,.$$ Then $E(|X_1 \wedge 0|)=E(X_1 \vee 0)= \infty$.
Now suppose $X_k$ are i.i.d. with this law. Then $P(X_k \le -k^2) \le 1/k^2$ is summable, so By Borel-Cantelli, a.s. all but finitely many of the $X_k$ satisfy $X_k >-k^2$. On the other hand, $$P\bigl(\, \forall 1 \le k \le n, \quad X_k \le e^{\sqrt{n}}-e \bigr)=\Bigl(1-\frac1{2\sqrt{n}}\Bigr)^n \le e^{-\sqrt{n}/2} \,, $$ which is summable. Thus by Borel Cantelli, almost surely, for all $n$ large enough we have $$S_n=X_1+\ldots +X_n \ge e^{\sqrt n}-e-n^3-C \,,$$ where $C$ is some random constant coming from the finitely many $X_k$ that satisfy $X_k \le -k^2$.