We will need some facts from the convolution theory. Let function $\omega\in C_c^\infty(\mathbb{R}^n)$ such that $\omega\geq 0$ and $\int_{\mathbb{R}^n}\omega(x)\, dx=1$. Denote $\omega_\varepsilon(x):=\varepsilon^{-n}\omega(x/\varepsilon)$.
If $f\in L^1_{\operatorname{loc}}(\mathbb{R}^n)$ then $f*\omega\in C^\infty(\mathbb{R}^n)$ and $D^\alpha (f*\omega)=f*(D^\alpha \omega)$ for all $\alpha\in \mathbb{N}^n_0$.
If $f\in C_c^\infty (\mathbb{R}^n)$ then $\sup\limits_{\mathbb{R}^n}|D^\alpha f-D^\alpha f*\omega_\varepsilon|\to 0$ at $\varepsilon \to 0$ for all $\alpha$ and there are $\varepsilon_0>0$ and compact set $K$ such shat $\operatorname{supp}f*\omega_\varepsilon,\operatorname{supp}f\subset K$, $\varepsilon\leq\varepsilon_0$ (i.e. $f*\omega_\varepsilon \to f$ in $C_c^\infty(\mathbb{R}^n)$).
$\int_{\mathbb{R}^n}|D^\alpha\omega_\varepsilon(x)|\, dx\leq c_\alpha\varepsilon^{-|\alpha|}$.
Fix $f\in C_c^\infty(\mathbb{R}^n)$. We will show that there is
$$\{h^\ell\}\subset X:=\operatorname{span}\big(g\, |\, g(x)=g_1(x_1)...g_n(x_n), g_1,...,g_n\in C_c^\infty (\mathbb{R})\big)$$
such that $\sup\limits_{\mathbb{R}^n}|D^\alpha f-D^\alpha h^\ell|\to 0$ for all $\alpha$ and $\operatorname{supp}h^\ell,\operatorname{supp}f\subset K$ for all $\ell$ and for some compact set $K$.
Consider binary cubes
$$Q^k_{i_1...i_n}=I^k_{i_1}\times...\times I^k_{i_n},\ k\in\mathbb{N}, i_j\in\mathbb{Z}$$
where $I^k_i=[i/2^k;(i+1)/2^k)$. Let
$$f^k(x)=\sum\limits_{i_1...i_n}f\Big(\frac{i_1}{2^k},...,\frac{i_n}{2^k}\Big)\chi_{Q^k_{i_1...i_n}}(x)$$
where the number of syllables is finite since $\operatorname{supp}f$ is compact. It is easy to show that
$$\delta_k:=\sup\limits_{\mathbb{R}^n}|f^k-f|\to 0.$$
Consider $\omega^1\in C_c^\infty(\mathbb{R})$ such that $\omega^1\geq 0$ and $\int_{\mathbb{R}}\omega^1(x)\, dx=1$. Then nonnegative function $\omega^n\in C_c^\infty(\mathbb{R}^n)$ defined by
$$\omega^n(x)=\omega^1(x_1)...\omega^1(x_n)$$
belongs to $C_c^\infty (\mathbb{R}^n)$. Also $\int_{\mathbb{R}^n}\omega^n(x)\, dx=1$ and $\omega^n_\varepsilon(x)=\omega^1_\varepsilon(x_1)...\omega^1_\varepsilon(x_n)$.
I. Using the diagonal Cantor process, we find the sequence $\{\varepsilon(\ell)\}$ such that $\varepsilon(\ell)\to 0$ and $\delta^\alpha_\ell:=\sup\limits_{\mathbb{R}^n}|D^\alpha f-D^\alpha f*\omega^n_{\varepsilon(\ell)}|\to 0$ for all $\alpha$.
II. $f^k*\omega^n_\varepsilon\in X$. Indeed
$$f^k*\omega^n_\varepsilon(x)=\int\limits_{\mathbb{R}^n}\sum\limits_{i_1...i_n}f\Big(\frac{i_1}{2^k},...,\frac{i_n}{2^k}\Big)\chi_{Q^k_{i_1...i_n}}(y)\omega^n_\varepsilon(x-y)\, dy$$
$$=\sum\limits_{i_1...i_n}f\Big(\frac{i_1}{2^k},...,\frac{i_n}{2^k}\Big)\int\limits_{\mathbb{R}}...\int\limits_{\mathbb{R}}\chi_{I^k_{i_1}}(y_1)...\chi_{I^k_{i_n}}(y_n)\omega^1_\varepsilon(x_1-y_1)...\omega^1_\varepsilon(x_n-y_n)\, dy_1...dy_n$$
$$=\sum\limits_{i_1...i_n}f\Big(\frac{i_1}{2^k},...,\frac{i_n}{2^k}\Big)\prod\limits_{j=1}^n\int\limits_{\mathbb{R}}\chi_{I^k_{i_j}}(y)\omega^1_\varepsilon(x_j-y)\, dy=\sum\limits_{i_1...i_n}f\Big(\frac{i_1}{2^k},...,\frac{i_n}{2^k}\Big)\prod\limits_{j=1}^n\chi_{I^k_{i_j}}*\omega^1_\varepsilon(x_j)$$
where $\chi_{I^k_{i_j}}*\omega^1_\varepsilon\in C_c^\infty(\mathbb{R})$.
III. We have
$$|D^\alpha f^k*\omega^n_\varepsilon(x)-D^\alpha f*\omega^n_\varepsilon(x)|\leq \int\limits_{\mathbb{R}^n}|f^k(y)-f(y)||D^\alpha\omega^n_\varepsilon(x-y)|\, dy\leq \delta_kc_\alpha\varepsilon^{-|\alpha|}.$$
IV. Using the diagonal Cantor process, we find the sequence $\{k(\ell)\}$ such that $\delta_{k(\ell)}\varepsilon(\ell)^{-|\alpha|}\to 0$ for all $\alpha$.
V. Finally let $h^\ell=f^{k(\ell)}*\omega^n_{\varepsilon(\ell)}$. We have
$$|D^\alpha f^{k(\ell)}*\omega^n_{\varepsilon(\ell)}(x)-D^\alpha f(x)|\leq |D^\alpha f^{k(\ell)}*\omega^n_{\varepsilon(\ell)}(x)-D^\alpha f*\omega^n_{\varepsilon(\ell)}(x)|+|D^\alpha f*\omega^n_{\varepsilon(\ell)}(x)-D^\alpha f(x)|$$
$$\leq \delta_{k(\ell)}c_\alpha\varepsilon(\ell)^{-|\alpha|}+\delta^\alpha_\ell.$$
I.e. $\sup\limits_{\mathbb{R}^n}|D^\alpha h^\ell-D^\alpha f|\leq \delta_{k(\ell)}c_\alpha\varepsilon(\ell)^{-|\alpha|}+\delta^\alpha_\ell$ where $\delta_{k(\ell)}c_\alpha\varepsilon(\ell)^{-|\alpha|}+\delta^\alpha_\ell\to 0$ for all $\alpha$.
The validity of condition
$$\operatorname{supp}h^\ell,\operatorname{supp}f\subset K$$
is easy to check.
Therefore $h^\ell\to f$ in $\mathcal{S}(\mathbb{R}^n)$ (this follows from the properties of the sequence $\{h^\ell\}$, and condition $\operatorname{supp}h^\ell\subset K$ plays an important role).
Best Answer
Start with $f\in \mathcal{S}(\mathbb{R}^n)$. For every $L>0$ let $\chi_1\in C_c^\infty(\mathbb{R})$ be the standard cut-off function such that $\varphi(t) =0$ for $\vert t \vert \geq 1$, $\varphi(0)=1$ and $\varphi^{(k)}(1)=0$ for all $k\geq 1$. Then we define for every $L\geq 0$ the cut-off function $$ \varphi_L: \mathbb{R} \rightarrow \mathbb{R}, \varphi_L(t) = \begin{cases} 0,& \vert t \vert\geq L+1, \\ 1,& \vert t \vert \leq L,\\ \varphi(t-\mathrm{sgn(t)L}),& L<\vert t \vert<L+1. \end{cases} $$ One readily checks that $\varphi_L\in C_c^\infty(\mathbb{R})$. We now define the cut-off function $$ \chi_L:\mathbb{R}^n \rightarrow \mathbb{R}, \chi_L(x) = \prod_{j=1}^n \varphi_L(x_j). $$ It is easy to check that for every $\alpha \in \mathbb{N}^n$ and all $L\geq 1$ we have $$ \sup_{x\in \mathbb{R}^n} \vert \partial^\alpha \chi_L(x) \vert = \sup_{x\in \mathbb{R}^n} \vert\partial^\alpha \chi_1(x) \vert < \infty. $$
The problem boils down to find a positive $g\in \mathcal{S}(\mathbb{R}^n)$ such that $\sqrt{g}\in \mathcal{S}(\mathbb{R}^n)$. Once we have found such a function, then we can define
$$ f_\varepsilon(x):=\sqrt{\chi_{1/\varepsilon}(x)f(x)+ \varepsilon g(x)}. $$
Define the semi-norms $$ \Vert f \Vert_{\alpha, \beta} := \sup_{x \in \mathbb{R}^n} \vert x^\beta \partial^\alpha f(x) \vert. $$
Since \begin{align*} \sup_{x\in \mathbb{R}^n} \vert \partial^\alpha \chi_{1/\epsilon}(x) \vert <\infty \end{align*} as noted above, we have \begin{align*} \lVert f_\epsilon^2 \rVert_{\alpha. \beta} <\infty \end{align*} for all $\alpha, \beta$, which implies that $f_\epsilon^2 \in \mathcal{S}(\mathbb{R}^n)$.
Now, consider another Schwartz function \begin{align*} F_\epsilon:=(1-\chi_{1/\epsilon}) f \end{align*} Since $F_\epsilon$ is supported on $[\frac{1}{\epsilon}, \infty)^n \sqcup (-\infty, -\frac{1}{\epsilon}]^n$, the estimate \begin{align*} \frac{1}{\epsilon^n} \lvert x^{\beta}\partial^\alpha F_\epsilon(x) \rvert \leq \Bigl \lvert \Bigl(\prod_{i=1}^n x_i \Bigr) x^{\beta}\partial^\alpha F_\epsilon(x) \Bigr \rvert=\lvert x^{\beta+(1,\cdots,1)}\partial^\alpha F_\epsilon(x) \rvert \end{align*} holds for $x \in \mathbb{R}^n$ so that \begin{align*} \frac{1}{\epsilon^n} \Vert F_\epsilon \Vert_{\alpha, \beta} \leq \Vert F_\epsilon \Vert_{\alpha, \beta+(1, \dots, 1)} \end{align*}
We also observed above that the supremum of $\partial^\alpha \chi_L$ does not depend on $L$. Hence $\Vert F_\epsilon \Vert_{\alpha, \beta+(1, \dots, 1)}$ does not depend on $\epsilon$, implying \begin{align*} \Vert f_\varepsilon^2 - f \Vert_{\alpha, \beta} \leq \Vert F_\epsilon \Vert_{\alpha, \beta} + \varepsilon \Vert g \Vert_{\alpha, \beta} \leq \epsilon^n \Vert F_\epsilon \Vert_{\alpha, \beta+(1, \dots, 1)} + \varepsilon \Vert g \Vert_{\alpha, \beta} \to 0^+ \text{ as } \epsilon \to 0^+ \end{align*} for all $\alpha,\beta$. That is, $f_\varepsilon^2 \to f$ in the Fréchet topology of $\mathcal{S}(\mathbb{R}^n)$.
Lastly, we need to check if $f_\epsilon \in \mathcal{S}(\mathbb{R}^n)$. However, because $g$ is assumed to be positive, it is clear that $f_\epsilon$ is smooth on whole $\mathbb{R}^n$. Moreover, $f_\varepsilon (x)=\sqrt{\varepsilon g(x)}$ for $\vert x \vert > 1/\varepsilon$ so that \begin{equation} \lvert x^\beta \partial^\alpha f_\epsilon(x) \rvert \leq \sup_{ \lvert x \rvert \leq \frac{1}{\epsilon} } \lvert x^\beta \partial^\alpha f_\epsilon(x) \rvert + \sqrt{\epsilon}\sup_{ \lvert x \rvert > \frac{1}{\epsilon} } \lvert x^\beta \partial^\alpha \sqrt{g}(x) \rvert \end{equation} for all $x \in \mathbb{R}^n$. Therefore, \begin{equation} \lVert f_\epsilon \rVert_{\alpha, \beta} < \infty \end{equation} for all $\alpha, \beta$, implying $f_\epsilon \in \mathcal{S}(\mathbb{R}^n)$.
Note that $g(x)=e^{-x^2}$ does the job as $\sqrt{g(x)}= e^{-x^2/2}$ is again Schwartz.