Figured Iād turn the comments into answer:
Let $B$ be a Bernoulli RV with parameter $a\in[0,1]$ so $B=1$ with chance $a$ and $B=0$ with chance $1-a$, let $X_i$ be an RV with PDF $f_i(x)$ for $i=1,2$. Assume $B$ is independent of $X_i$ for all $i$.
Then we claim the RV defined by
$$Z=BX_1 + (1-B)X_2$$
has the mixture density
$$f(x)=af_1(x)+(1-a)f_2(x).$$
That $Z$ has the appropriate expected value is easily verified. The variance can be computed and verified via law of total variance, comment for further details if you have trouble, clarifications, or if you spot any mistakes I missed.
This is true assuming you are free to choose $\alpha, \beta$ however you wish. Convergence in distribution of a sequence of real-valued random variables means their cdfs $F_n$ satisfy $\lim_{n\rightarrow\infty} F_n(x) = F(x)$ for each point $x \in \mathbb{R}$ at which $F$ is continuous. We can show that, for any $\varepsilon > 0$, there are $A$ and $B$ such that for all $\alpha > A$, $\beta < B$,
$$\sup_{x \in \mathbb{R}} |F_{\alpha,\beta}(x) - F(x)| < \varepsilon.$$
This is enough to extract a sequence $\alpha_n, \beta_n$.
This turned into quite a lengthy post, so let me just say the idea is simple: you approximate the density with piecewise constant functions, and all that matters is the areas under the curves converge uniformly.
Let then $\varepsilon > 0$ be given, and let $\Phi$ denote the cdf of a standard Gaussian. There is $A > 0$ large enough that $\Phi(-A) < \varepsilon/4$, which by symmetry also implies $\Phi(A) > 1-\varepsilon/4$. Fix some $\alpha > A$. We have just cut off the tails.
Given $x_i = -\alpha + i\beta$ with $n = 2\alpha/\beta \in \mathbb{Z}$, there are $n$ intervals $I_i = [x_i,x_{i+1})$ that cover $[-\alpha, \alpha)$. Assuming $p_i = \Phi(x_{i+1}) - \Phi(x_i)$, the total probability mass allocated is $1 - 2\Phi(-\alpha)$; the remaining mass can be assigned anywhere outside of $[-\alpha,\alpha)$; say it is assigned to $x > \alpha$. I'll ignore any technicalities with the right end-point (it has probability 0).
Define a "locator" map $\ell : [-\alpha, \alpha) \rightarrow \{0, ..., n-1\}$ which associates to any $x$ the unique index $i$ of the left end-point in the interval $I_i$ (so in particular $\ell(x_i) = i)$. Remembering that the density of the $i^{th}$ uniform random variable is $(1/\beta)1_{I_i}$, the cdf $F_{\alpha, \beta}$ satisfies
$$F_{\alpha, \beta}(x) = p_{\ell(x)}\frac{x - x_{\ell(x)}}{\beta} + F_{\alpha,\beta}(x_{\ell(x)}),$$
and note that the approximate cdf agrees with $\Phi$ at the discretization points $x_i$ up to a shift by $\Phi(-\alpha)$:
$$F_{\alpha,\beta}(x_i) = \sum_{i'=1, ..., i-1} p_{i'} = \sum_{i' = 1,...,i-1} (\Phi(x_{i'+1}) - \Phi(x_{i'})) = \Phi(x_{i}) - \Phi(-\alpha).$$
Thus, for any $x \in [-\alpha, \alpha)$,
\begin{align*}
F_{\alpha,\beta}(x) - \Phi(x) &= p_{\ell(x)}(x - x_{\ell(x)})/\beta + F_{\alpha,\beta}(x_{\ell(x)}) - \Phi(x) \\
&= p_{\ell(x)}(x - x_{\ell(x)})/\beta + \Phi(x_{\ell(x)}) - \Phi(-\alpha) - [\Phi(x) - \Phi(x_{\ell(x)}) + \Phi(x_{\ell(x)})]\\
&= [p_{\ell(x)}(x - x_{\ell(x)})/\beta - (\Phi(x) - \Phi(x_{\ell(x)}))] - \Phi(-\alpha).\tag{1}
\end{align*}
The left term in brackets in the last equality above is
$$(\Phi(x_{\ell(x)+1}) - \Phi(x_{\ell(x)}))(x - x_{\ell(x)})/\beta - (\Phi(x) - \Phi(x_{\ell(x)})),$$
which, if you squint, is the fundamental theorem of calculus:
$$\Phi'(a)(x-a) \approx \frac{\Phi(b) - \Phi(a)}{\beta}(x - a) \approx (\Phi(x) - \Phi(a)).$$
I leave it to the reader to justify using compactness of $[-\alpha,\alpha]$ and differentiability of $\Phi$ on $(-\alpha,\alpha)$ that one can find $B > 0$ such that any $\beta < B$ makes the term in brackets as small as desired, less than $\varepsilon/2$.
Going back to $(1)$, we find that for $\alpha > A$ and $\beta < B$ and $x \in [-\alpha, \alpha)$, we get
$$|F_{\alpha,\beta}(x) - \Phi(x)| < \varepsilon/2 + \varepsilon/4.$$
For the remaining $x$, we've misplaced at most $2\Phi(-\alpha)$ mass, which is bounded by $\varepsilon/2$. Thus,
$$\sup_{x \in \mathbb{R}} |F_{\alpha,\beta}(x) - \Phi(x)| < \varepsilon,$$
which establishes the desired convergence.
Best Answer
I think that you are confusing two separate things. When we say that $X$ is a mixture of $X_1$ and $X_2$, with respective probability density functions $f_1$ and $f_2$, we mean that there exists constants $\lambda_1,\lambda_2\ge0$ such that $\lambda_1 + \lambda_2 = 1$ and $$f(x) = \lambda_1f_1(x) + \lambda_2f_2(x) $$ Where $f$ is the pdf of $X$.
In your question, you have $\lambda_1 := a$ and $\lambda_2 := 1-a$, thus the pdf of $X$ is $f = af_1 + (1 ā a)f_2$. To compute $\mathbb E[X^2]$, you can then proceed as usual with the law of the unconscious statistician : $$\mathbb E[X^2] = \int x^2f(x)\, dx = \int x^2(af_1(x)+(1-a)f_2(x))\,dx $$
If you considered the random variable $\tilde X := aX_1 + (1-a)X_2$ instead, you would not have in general that the pdf of $\tilde X$ is $af_1 + (1-a)f_2$. Consider for instance $X_1$ and $X_2$ to be independent standard normal r.v.s, and $a = 1/2$ :
You have that $f = af_1 + (1-a)f_2 = f_1 = f_2$ so the mixture has the same distribution as $X_1$ and $X_2$, but $\tilde X = aX_1 + (1-a)X_2 \sim \mathcal N(0,1/2)$ is not a standard normal r.v.