A theorem by Markov states that if a sequence of random variables $X_1, X_2, \ldots$ with finite variances fulfills one of conditions:
- $\lim_{n \to \infty} \frac{1}{n^2} \mathrm{Var} \sum_{i = 1}^n X_n = 0$;
- $X_1, X_2, \ldots$ are independent and $\lim_{n \to \infty}\frac{1}{n^2}\sum_{i = 1}^n \mathrm{Var} X_i = 0$;
then the sequence $Y_n = \frac{1}{n}\sum_{i=1}^n (X_i - \mathsf{E} X_i)$ converges for $n \to \infty$ to $0$ in probability.
In addition, if random variables $X_1, X_2, \ldots$ are identically distributed, have finite variance and are uncorrelated (instead of independent), then the proof of the weak law of large numbers using Chebyshev's inequality still holds.
EDIT: Corrected the first condition, thanks to @Michael.
There is likely a proof somewhere on this site but I could not find it. Here I give a quick proof of my comment (since I originally mis-stated the result by forgetting the "lower bounded" restriction):
Let $\{X_i\}_{i=1}^{\infty}$ be a sequence of random variables, not necessarily identically distributed and not necessarily independent, that satisfy:
i) $E[X_i]=m_i$, where $m_i \in \mathbb{R}$ for all $i\in\{1, 2, 3, ...\}$.
ii) There is a constant $\sigma^2_{bound}$ such that $Var(X_i) \leq \sigma^2_{bound}$ for all $i \in \{1, 2, 3, ...\}$.
iii) The variables are pairwise uncorrelated, so $E[(X_i-m_i)(X_j-m_j)]=0$ for all $i \neq j$.
iv) There is a value $b \in \mathbb{R}$ such that, with prob 1, $X_i-m_i\geq b$ for all $i \in \{1, 2, 3, ...\}$.
Define $L_n = \frac{1}{n}\sum_{i=1}^n (X_i-m_i)$. Then $L_n\rightarrow 0$ with prob 1.
Proof: Since the variables are pairwise uncorrelated with bounded variance, we easily find for all $n$:
$$ E[L_n^2] = \frac{1}{n^2}\sum_{i=1}^n \sigma_i^2 \leq \frac{\sigma_{bound}^2}{n} $$
Fix $\epsilon>0$. It follows that:
$$ P[|L_n|>\epsilon] = P[L_n^2 \leq \epsilon^2] \leq \frac{E[L_n^2]}{\epsilon^2} \leq \frac{\sigma_{bound}^2}{n\epsilon^2} $$
Hence:
$$ \sum_{n=1}^{\infty} P[|L_{n^2}|>\epsilon] \leq \sum_{n=1}^{\infty}\frac{\sigma_{bound}^2}{n^2\epsilon^2} < \infty $$
and so $L_{n^2}\rightarrow 0$ with probability 1 by the Borel-Cantelli Lemma. That is, the $L_n$ values converge over the sparse subsequence $n\in\{1, 4, 9, 16, ...\}$.
Since $L_n \geq b$ for all $n$ and $L_{n^2}\rightarrow 0$ with probability 1, it can be shown that $L_n\rightarrow 0$ with probability 1. $\Box$
The lower bounded condition is typically treated by writing $X_n = X_n^+ - X_n^-$ where $X_n^+$ and $X_n^-$ are nonnegative and defined $X_n^+=\max[X_n,0]$, $X_n^-=-\min[X_n,0]$. If $X_n$ and $X_i$ are independent, then $X_n^+$ and $X_i^+$ are also independent. So the lower bounded condition can be removed for the case when variables are independent. However, if $X_n$ and $X_i$ are uncorrelated, that does not mean $X_n^+$ and $X_i^+$ are uncorrelated. So it is not clear to me if the lower-bounded condition can be removed when "independence" is replaced by the weaker condition "pairwise uncorrelated."
Best Answer
First observe that in this setting, the sequence $\left(S_n\right)_{n\geqslant 1}$ is independent. For an independent sequence, in view of the Borel-Cantelli lemma, almost sure convergence and complete convergence are equivalent. Hence $S_n\to \mu$ almost surely if and only if for all positive $\varepsilon$, $$\tag{*} \sum_{n\geqslant 1}\mathbb P\left(\lvert S_n-\mu\rvert \gt\varepsilon\right)<+\infty. $$ Let $(Y_i)_{i\geqslant 1}$ be an i.i.d. sequence such that $Y_1$ has the same law as $X_{1,1}$. Then $(*)$ is equivalent to $$ \forall \varepsilon>0, \sum_{n\geqslant 1}\mathbb P\left(\left\lvert \sum_{j=1}^n(Y_j-\mu)\right\rvert \gt n\varepsilon\right)<+\infty. $$ By Theorem 3 in this paper by Baum and Katz, this is equivalent to $\mathbb E[Y_1^2]<\infty$, hence we do need extra conditions.