On your effort
Your efforts are in the right direction, but what you've got wrong is the jump from the step that $\langle f,e_n \rangle = 0$ for large enough $n$. We don't actually know this : and $\lim_{n \to \infty} \langle f,e_n\rangle = 0$ clearly doesn't imply this.
That's the mistake : but otherwise I'm very happy. You saved me the trouble of proving the inclusion is well defined, and continuity of the inclusion is fairly clear as well from the inequality used to establish the well-defined nature of the inclusion. (Because it makes it clear that the operator is bounded hence continuous).
On the theorem to be used
Let's see : The theorem provides iff conditions for a diagonal operator to be compact. However, the problem is quite simple : $T \in B(H)$ is not true, because we have here, two different Hilbert spaces $W^{s_1,2}$ and $W^{s_2,2}$ with their own norms, and while one can possibly think of generalizing the given theorem to this scenario, it clearly can't be used as is, because we don't have an operator that's going from one space back to the same space.
Attempting something with the direct sum , in order to bring the spaces under the same umbrella do not seem to be useful either , particularly because doing this makes the operator lose diagonalizability.
So, what you'll have to do if you were to use the theorem , is do what Alex suggested : interpret $W^{s_1,2}, W^{s_2,2}$ as subspaces of $L^2(\mathbb T)$ using appropriate maps, so you can try to create the inclusion using a self-mapping on $L^2(\mathbb T)$, and you are in good shape. Making sure the appropriate map is bounded will be sufficient, since the composition of a compact map with any number of bounded maps on either side remains compact.
This is a very important point : to use a theorem that is known for an operator on a space for an operator between two different spaces, we try to find a space where those two are embedded in, and use the theorem for an appropriate operator on that space. A technique to keep in mind.
The details
$W^{s_2,2}$ is the set of all functions in $L^2(\mathbb T)$ endowed with the norm $\|f\|^2_{W^{s_2,2}} = \sum_{n \in \mathbb Z} (1+n^2)^{s_2} |\hat{f}(n)|^2$ (so that only functions with that particular norm finite are considered). Similarly for $W^{s_1,2}$.
So if $e_n = \frac{e^{inx}}{\sqrt 2 \pi}$ then $e_n$ is an orthonormal basis of $L^2(\mathbb T)$. Under the $W^{s_2,2}$ norm and inner product resulting from it, we know that the $e_n$ continue to belong to, and be orthogonal in $W^{s_2,2}$ (recall that $\hat{f}(n) = \langle f,e_n\rangle$ so this makes this fact obvious). To make the $e_i$ orthonormal, observe that $\langle e_i,e_i\rangle_{W^{s_2,2}} = (1+i^2)^{s_2}$. Therefore, it follows that $d_i = \frac{e_i}{(1+i^2)^{\frac {s_2} 2}}$ is an orthonormal basis of $W^{s_2,2}$.
This fact is useful, because now we define the following map from $W^{s_2,2}$ to $\mathbb L^2(\mathbb T)$ : it is given by $T(e^{inx}) = (1+i^2)^{\frac{s_2}{2}}e^{inx}$ (scaled to the orthonormal basis $d_i$, and extended by linearity to the rest of the space). It's fairly clear to see that this is an isometric embedding.
Similarly , define $T': L^2(\mathbb T) \to W^{s_1,2}$ given by $e^{inx} \to (1+n^2)^{-\frac {s_1}2}e^{inx}$. Note that this is well-defined and is an isometry as well.
Now, consider the map $S : \mathbb L^2(T) \to \mathbb L^2(T)$ given by $S(e^{inx}) = (1+n^2)^{\frac{s_1-s_2}{2}} e^{inx}$. This is a diagonal (continuous) map, and is compact because of the theorem!
You just need to verify that $T'ST$ is the inclusion map from $W^{s_2,2}$ to $W^{s_1,2}$ and you are done, because $S$ is compact and the other maps are bounded.
The second part
For the second part, you seem to only be demanding inclusion, rather than compact inclusion.
For this, the point is quite simple : if $f$ is (classically/weakly) differentiable, then the Fourier coefficients of $f'$ have a nice relation with that of $f$ : indeed, roughly speaking, an integration by parts can be used to prove that $\hat{f'}(n) = --in \hat{f}(n)$. It's instantly obvious that an inductive argument can be used for proving that $W^{s,2}(\mathbb T) \subset C^n(\mathbb T)$ for $s > n + \frac 12$, since going down the derivative chain leaves us to prove that $W^{s,2}(\mathbb T) \subset C^0(T)$ for $s>\frac 12$, which is clear from making such a function a uniform limit of trigonometric polynomials and using the Weierstrass M-test. (I leave you to fill in the details, they are fairly standard and can be found in e.g. Rudin's book, or in Stein-Shakarchi).
That $C^n(\mathbb T) \subset W^{n,2}(\mathbb T)$ works out from a continuous function attaining its supremum/infimum on a compact set, and then stuff like integrability etc. is obvious from this point.
The generalized Rellich-Kondrasov theorem
The generalized Rellich-Kondrasov theorem (cited as Theorem 6.2, Chapter 6, Adams, Sobolev spaces) seeks to generalize the embedding of Sobolev spaces. While there are many, many conditions that can be of concern to us, the one that matters is part I and II of the conditions : If the domain $\Omega$ satisfies the cone conditions, then the map $W^{j+m,p}(\Omega) \to W^{j,q}(\Omega)$ is compact if $mp > n$ (where $\Omega \subset \mathbb R^n$) and $1 \leq q \leq \infty$, and is also compact if $mp \leq n$ and $q \leq \frac{np}{n-mp}$. (where if $n=mp$ then the RHS is infinite)
The cone condition is a technical condition that is quite easy to verify for the torus, so skipping that we have $j=s_1,m = s_2-s_1,p=q=2$ and $n=1$. If $s_2-s_1 >\frac 12$ we are done by the first condition, and if $s_2-s_1 \leq \frac 12$ then we are done by (doing some algebra and verifying) the second condition.
Of course, we barely used the full strength of the RK theorem, but this still shows its power in the face of general embedding theorems.
Best Answer
One approach is to note that $C$ is essentially the embedding of $X^*$ into the Cameron-Martin space $H \subset X$. Now the result follows from the fact that $H$ is compactly embedded into $X$. This can be found as:
Lemma I.4.5 of Kuo, Hui-Hsiung, Gaussian measures in Banach spaces, Lecture Notes in Mathematics. 463. Berlin-Heidelberg-New York: Springer-Verlag. VI, 224 p. DM 23.00 (1975). ZBL0306.28010.
Corollary 3.2.4 of Bogachev, Vladimir I., Gaussian measures. Transl. from the Russian by the author, Mathematical Surveys and Monographs. 62. Providence, RI: American Mathematical Society (AMS). xii, 433 p. (1998). ZBL0913.60035.
Another approach, similar to what I did in Proposition 4.16 of these lecture notes, is to first prove that $\int_X \|x\|^2\,d\mu(x) < \infty$, e.g. as a corollary of the much stronger Skorokhod or Fernique theorems. Now suppose we have a sequence $f_n$ in the unit ball of $X^*$. By Alaoglu, we can pass to a subsequence converging weak-* to some $f$. Now since the $f_n$ have norm at most one, then as functions on $X$, they are dominated by the square-integrable function $\|\cdot\|$. Hence by dominated convergence, they converge to $f$ strongly in $L^2(\mu)$. Now Cauchy-Schwarz gives, for any $g \in X^*$ with $\|g\| \le 1$, $$|q(f_n - f, g)| \le \|f_n - f\|_{L^2} \|g\|_{L^2} \le \|f_n - f\|_{L^2} \left(\int \|x\|^2\mu(dx)\right)^{1/2}$$ where the right side approaches 0 uniformly in $g$, so we have $q(f_n - f, \cdot) \to 0$ in $X^{**}$.