You are probably stuck because the random variable $N_n$ may assume far too many values for Kolmogorov's inequality to provide an effective upper bound. This suggests to deal separately with the case when $N_n$ is around $a_n$ (which, by Kolmogorov's inequality, should yield small values of $S_{N_n}-S_{a_n}$) and with the case when $N_n$ is far from $a_n$ (which, from the hypothesis that $N_n/a_n\to1$ in probability, should have small probability).
Hence, let us introduce, for a given positive $\varepsilon$, the event $$A_n=[(1-\varepsilon) a_n\leqslant N_n\leqslant (1+\varepsilon) a_n].$$
On the one hand, $N_n/a_n\to1$ in probability hence $A_n$ is typical in the sense that $\mathrm P(\Omega\setminus A_n)\to0$.
On the other hand, $|S_{N_n}-S_{a_n}|\leqslant |S_{N_n}-S_{(1-\varepsilon) a_n}|+|S_{a_n}-S_{(1-\varepsilon) a_n}|$ hence, on the event $A_n$,
$$
|S_{N_n}-S_{a_n}|\leqslant 2M_n,\qquad M_n=\sup\limits_{1\leqslant k\leqslant 2\varepsilon a_n}|T_k|,\qquad T_k=S_{(1-\varepsilon) a_n+k}-S_{(1-\varepsilon) a_n}.
$$
Now, we are back to the realm where Kolmogorov's inequality applies, and yields
$$
\mathrm P(M_n\geqslant x\sqrt{a_n})\leqslant (a_nx^2)^{-1}\mathrm{Var}(T_{2\varepsilon a_n})=(a_nx^2)^{-1}(2\varepsilon a_n)\sigma^2=2\varepsilon x^{-2}\sigma^2.
$$
Putting our two estimates together yields
$$
\mathrm P(|S_{N_n}-S_{a_n}|\geqslant 2x\sqrt{a_n})\leqslant\mathrm P(\Omega\setminus A_n)+\mathrm P(M_n\geqslant x\sqrt{a_n})\leqslant\mathrm P(\Omega\setminus A_n)+2\varepsilon x^{-2}\sigma^2.
$$
This proves that, for every positive $\varepsilon$,
$$
\limsup\limits_{n\to\infty}\ \mathrm P(|S_{N_n}-S_{a_n}|\geqslant 2x\sqrt{a_n})\leqslant2\varepsilon x^{-2}\sigma^2,
$$
hence $\mathrm P(|S_{N_n}-S_{a_n}|\geqslant2x\sqrt{a_n})\to0$ for every $x$, that is, $S_{N_n}/\sqrt{a_n}-S_{a_n}/\sqrt{a_n}\to0$ in probability.
By the usual central limit theorem, since $a_n\to+\infty$, $S_{a_n}/\sqrt{a_n}$ converges in distribution to a centered gaussian distribution with variance $\sigma^2$, hence $S_{N_n}/\sqrt{a_n}$ converges in distribution to the same centered gaussian distribution with variance $\sigma^2$.
Define $S_n:=\sum_{j=1}^nX_j$ and $\varphi_n$ as the characteristic function of $S_n$. Using independence and normal distribution, we have
$$\varphi_n(t)=\exp\left(it\sum_{j=1}^n\mu_j\right)\cdot\exp\left(-\frac{t^2}2\sum_{j=1}^n\sigma_j^2\right).$$
As $S_n$ converges almost surely, we should have that $(\varphi_n(t))_{n\geqslant 1}$ is a convergent sequence for each $t$ to $\varphi(t)$, where $\varphi$ is a continuous function (the characteristic function of the limiting distribution) .
If $\sum_{j\geqslant 1}\sigma_j^2$ was divergent, then we would have $\varphi(t)=0$ if $t\neq 0$ and $\varphi(0)=1$, a contradiction. Define $s_n:=\sum_{j=1}^n\mu_j$. We have that for each $t$, the sequence $(e^{its_n})_{n\geqslant 1}$ is convergent hence $(s_n)_{n\geqslant 1}$ is convergent.
Best Answer
Weak convergence is equivalent to convergence of characteristic functions. Take absolute value in characteristic functions to see that the variances converge. Then it becomes obvious that the means also converge.