It wouldn't change much, since
$$(1-x^4)^n \geq 1-nx^4$$
by the same reason (ie, the derivative of $f(x)=(1-x^4)^n-1+nx^4$ is $-4nx^3(1-x^4)^{n-1}+4nx^3$, which is positive on $(0,1)$) and then you can integrate up to $1/\sqrt[4]{n}$ and get a similar analysis for $c_n$. Then, the only times that we see $(1-x^2)^n$ appearing again is in justifying the uniform convergence of $Q_n$ outside any small interval around $0$, which will hold by essentially the same argument, and in the last inequality, which will not make a difference.
However, there is more to it than "it wouldn't change anything". The idea of the proof by Rudin is that the convolution of two functions should preserve the best properties out of the two functions. One good example of this is the convolution with a smooth compactly supported function, which is a good way to prove uniform density of the smooth functions on $C^0(I)$, for example. So he strikes for a convolution with a polynomial.
The point is: if $f$ is a function (continuous, $L^1_{loc}$ or whatever depending on context), we expect that after making a convolution with a function $\phi$ which is normalized as to have $\int \phi =1$, if the support is small enough and concentrated near $0$, then the convolution $\widetilde{f}$ is near $f$ (think of a discrete averaging where $\phi$ is a "weight", and the convolution at a point is the average of the function according to that weight centralized on the point. If we put more weight at the point, and less on its near values, then the average shouldn't shake things that much). This is easy to arrange for $C^{\infty}$ functions, since we have bump functions. But here we would like to prove density of polynomials, which are not so malleable. With effect, we can't have a polynomial with compact support and integrating $1$ (we can't have non-trivial polynomials with compact support at all!).
So we try to emulate a compact support by taking a polynomial which is highly concentrated near the origin. The polynomial $p(x)=1-x^2$ is maybe the most trivial example of something concentrated near the origin. An important observation is that since the function is defined on $[0,1]$, what matters to our avering polynomial is its behaviour in $[-1,1]$, so the fact that $(1-x^2)$ explodes outside of $[-1,1]$ is irrelevant. Then we raise it to the power of $n$ since, being on $(-1,1)$, this will concentrate things even more near the origin (you can plot a few powers to try and see this). The estimates that Rudin do are to guarantee that everything goes well and represent the technical difficulty of not having compact support (also as small as desired).
The polynomial $1-x^4$ also satisfies the property that if we raise it to $n$, things will concentrate near the origin. The only difference is that it is less concentrated than $1-x^2$, so that the error you make with the convolution will probably be bigger than if you used $1-x^2$.
Best Answer
For simplicity, let us consider $\mathbb{R}^2$ and $f:\mathbb{R}^2\rightarrow \mathbb{R}^2$. Then we see that $\phi:\mathbb{R}^2\rightarrow \mathbb{R}^2$ given by \begin{align} \phi(x_1, x_2) =&\ \begin{pmatrix} \phi_1(x_1, x_2)\\ \phi_2(x_1, x_2) \end{pmatrix}\\ =&\ \begin{pmatrix} x_1\\ x_2 \end{pmatrix} + \begin{pmatrix} f_{1, x_1} (a_1, a_2) & f_{1, x_2}(a_1, a_2)\\ f_{2, x_1} (a_1, a_2) & f_{2, x_2}(a_1, a_2) \end{pmatrix}^{-1} \left( \begin{pmatrix} y_1\\ y_2 \end{pmatrix} + \begin{pmatrix} f_1(x_1, x_2)\\ f_2(x_1, x_2) \end{pmatrix} \right)\\ =&\ \begin{pmatrix} x_1\\ x_2 \end{pmatrix}+ \frac{1}{\det Df(a)}\begin{pmatrix} f_{2, x_2} (a_1, a_2)f_1(x_1, x_2) -f_{1, x_2}(a_1, a_2)f_2(x_1, x_2)\\ -f_{2, x_1} (a_1, a_2)f_1(x_1, x_2)+ f_{1, x_1}(a_1, a_2)f_2(x_1, x_2) \end{pmatrix} +\text{ const vector} \end{align}
Then we see that \begin{align} \nabla\phi(x_1, x_2) =&\ \begin{pmatrix} \phi_{1, x_1} & \phi_{1, x_2}\\ \phi_{2, x_1} & \phi_{2, x_2} \end{pmatrix}\\ =&\ \begin{pmatrix} 1 & 0\\ 0 & 1 \end{pmatrix} +\frac{1}{\det Df(a)} \begin{pmatrix} f_{2, x_2} (a_1, a_2)f_{1, x_1}(x_1, x_2) -f_{1, x_2}(a_1, a_2)f_{2, x_1}(x_1, x_2) & f_{2, x_2} (a_1, a_2)f_{1, x_2}(x_1, x_2) -f_{1, x_2}(a_1, a_2)f_{2, x_2}(x_1, x_2) \\ -f_{2, x_1} (a_1, a_2)f_{1, x_1}(x_1, x_2)+ f_{1, x_1}(a_1, a_2)f_{2, x_1}(x_1, x_2) & -f_{2, x_1} (a_1, a_2)f_{1, x_2}(x_1, x_2)+ f_{1, x_1}(a_1, a_2)f_{2, x_2}(x_1, x_2) \end{pmatrix}\\ =&\ \begin{pmatrix} 1 & 0\\ 0 & 1 \end{pmatrix} +\frac{1}{\det Df(a)}\begin{pmatrix} f_{2, x_2} (a_1, a_2)& -f_{1, x_2}(a_1, a_2)\\ -f_{2, x_1} (a_1, a_2)& f_{1, x_1}(a_1, a_2) \end{pmatrix} \begin{pmatrix} f_{1, x_1} (x_1, x_2)& f_{1, x_2}(x_1, x_2)\\ f_{2, x_1} (x_1, x_2)& f_{2, x_2}(x_1, x_2) \end{pmatrix}\\ =&\ I+Df(a_1, a_2)^{-1} Df(x_1, x_2). \end{align}
Higher Dimension:
However, when $n$ is large, the above way of expanding everything out then taking partial derivatives is messy. Hence we need to compute the derivative in a more elegant manner.
In general, we see that \begin{align} D_x \phi(x) =&\ D_x x+ D_x [A^{-1}(y-f(x))]\\ =&\ I+ D_x[A^{-1}]\circ D_x[y-f(x)]\\ =&\ I+ A^{-1}\circ(-Df(x)) = I-Df(a)^{-1} Df(x) \end{align}