I think a lot of the confusion in settings such as this can occur because the notation often used in statistics does not remind the reader than $S_n$ is not a number. It's a function. We have a probability space $\Omega$, and each $S_n$ is a function $S_n:\Omega\to\mathbb{R}$. For any $\omega\in \Omega$, $S_n(\omega)$ is a sequence of numbers. For different $\omega$, we may get different sequences which may have different behaviors.
Convergence in probability does not say anything about any particular $\omega$. It says something about $\Omega$ overall, roughly that $\mathbb{P}(\{\omega\in\Omega:|S_n(\omega)-\mu|>\epsilon\})$ is small for large $n$ (and, more precisely, thatn for any $\epsilon$, this sequence approaches zero as $n\to\infty$). This is what is meant when it says that $S_n$ is likely to be near $\mu$ for large $n$. But I think that the statement about $n^*$ is (without mentioning it) transitioning to talking what's happening at a specific $\omega$. We can say that fo r $n^*$, $S_{n^*}$ is likely to be near $\mu$ but if $S_{n^*}(\omega)$ is near $\mu$ for some particular $\omega$, it is not necessarily the case that $S_n(\omega)$ is near $\mu$ for this same $\omega$ and later $n^*$. The strong law precludes this because if $\omega\in \Omega$ is such that $\lim_n S_n(\omega)=\mu$, then for any $\epsilon>0$, $\{n:|S_n(\omega)-\mu|>\epsilon\}$ is finite. Note that both of these last two conditions are about a single $\omega$ and are about a sequence of numbers, $S_n(\omega)$, not random variables (functions) $S_n$. Therefore if $\omega$ is such thbat $\{n:|S_n(\omega)-\mu|>\epsilon\}$ is finite, then $S_n(\omega)$ does not converge to $\mu$, and the set of such $\omega$ has probability zero.
Let's imagine what could be happening to get convergence in probability but not convergence a.s. Let $\Omega=[0,1)$ endowed with the usual Lebesgue measure $\mathbb{P}$. Let $F_1^1:\Omega\to \{0,1\}$ be $1$ on the whole interval $[0,1)$. Let $F^2_1,F^2_2:\Omega\to\{0,1\}$ be such that $F^2_1$ is $1$ on the first half of $[0,1)$ and $0$ on the second half. Let $F^2_2$ be $1$ on the second half of $[0,1)$ and $0$ on the first. More generally, let $F^n_1,\ldots,F^n_n:[0,1)\to\{0,1\}$ be such that $F^n_k$ is $1$ on $[(k-1)/n,k/n)$ and zero on the rest of $[0,1)$.
Note that for $0<\epsilon<1$, $$\mathbb{P}(|F^n_k-0|>\epsilon)=1/n,$$ because the set where $|F^n_k-0|>\epsilon$ is $[(k-1)/n,k/n)$. So if $X_1,X_2,\ldots$ is an enumeration of $F^1_1,F^2_1,F^2_2,F^3_1,F^3_2,F^3_3,\ldots$, $X_m$ converges to $0$ in probability. But for any particular point $\omega\in[0,1)$, $X_m(\omega)$ does not converge to $0$. This is because $X_m(\omega)$ will be $1$ for one of the functions $F^1_1$, one of the functions $F^2_1,F^2_2$, one of the functions $F^3_1,F^3_2,F^3_3$, etc. So these "bad sets" $A_m=\{\omega\in\Omega:|X_m-0|>\epsilon\}$ have probabilities going to zero, but they bounce around enough so that each point occurs in infinitely many of the $A_m$ (but it will take longer and longer between $m$ in which a particular $\omega$ appears). More specifically, the $A_m$ seequence is $A_1=[0,1)$, $A_2=[0,1/2)$, $A_3=[1/2,1)$, $A_4=[0,1/3)$, $A_5=[1/3,2/3)$, $A_6=[2/3,1)$, etc. So $\mathbb{P}(\{\omega\in\Omega:\lim_n X_n(\omega)=0\})=0$, not $1$.
If there were a sequence of iid random variables $Y_n$ such that $X_n=(Y_1+\ldots+Y_n)/n$, then this sequence would satisfy the weak law of large numbers but not the strong law, because $X_n$ converges to zero in probability but not a.s. However, there is no such sequence of $Y_i$ (indeed, one way to prove that there can be no such $Y_i$ is exactly because if it did exist, then the sample means $X_n$ would converge a.s. by SLLN).
Best Answer
The weak law of large numbers essentially only states that the random variable $S_n/n$, describing the average number of heads obtained, tends to its expected value $E\left[\frac{S_n}n\right]=\mu=0.5$ in probability. Graphically, this means that as $n$ grows, the PMF of $S_n/n$ shrinks and becomes increasingly localized around $0.5$.
For the asymptotic behaviour of $S_n$, you have to refer to the Central Limit Theorem which states that the sum of $n$ independent, identically distributed random variables $X_i;i=1,2,3,...$ is asymptotically normal under certain conditions. Also note that in your example, $S_n$ is actually a binomial distribution. The special case of CLT that deals with binomial distributions is the De Moivre-Laplace Theorem.
For large but finite $n$, you may approximate $S_n$ by a normal distribution, in accordance with the CLT. For lower bounds on $P(|S_n-E[S_n]|<k\sqrt{V[S_n]})$ for all positive $k$, the Chebychev's inequality comes to our aid. From the Chebychev's inequality, we see$$P(|S_n-n/2|<\frac{k\sqrt n}2)\ge1-\frac1{k^2}$$or$$P(|S_n-n/2|<m)\ge1-\frac n{4m^2}$$