I think you are on the wrong track with the substitution. Do try to prove:
if $A$ is a dense subset of $C([0,1])$ (wrt uniform convergence, of course) and, for $f\in C([0,1]),$ $\int_0^1 f(x) g(x) dx = 0 \quad\forall g\in A$ then $ f=0$.
Weierstrass/Stone-Weierstrass give you dense subsets of $C([0,1])$. The Weierstrass part should be more or less obvious with the statement I asked you to prove. For the Stone-Weierstrass part try to find an algebra satisfying the hypothesis of the theorem by looking at the functions which you are given, $e^{nx}$.
It wouldn't change much, since
$$(1-x^4)^n \geq 1-nx^4$$
by the same reason (ie, the derivative of $f(x)=(1-x^4)^n-1+nx^4$ is $-4nx^3(1-x^4)^{n-1}+4nx^3$, which is positive on $(0,1)$) and then you can integrate up to $1/\sqrt[4]{n}$ and get a similar analysis for $c_n$. Then, the only times that we see $(1-x^2)^n$ appearing again is in justifying the uniform convergence of $Q_n$ outside any small interval around $0$, which will hold by essentially the same argument, and in the last inequality, which will not make a difference.
However, there is more to it than "it wouldn't change anything". The idea of the proof by Rudin is that the convolution of two functions should preserve the best properties out of the two functions. One good example of this is the convolution with a smooth compactly supported function, which is a good way to prove uniform density of the smooth functions on $C^0(I)$, for example. So he strikes for a convolution with a polynomial.
The point is: if $f$ is a function (continuous, $L^1_{loc}$ or whatever depending on context), we expect that after making a convolution with a function $\phi$ which is normalized as to have $\int \phi =1$, if the support is small enough and concentrated near $0$, then the convolution $\widetilde{f}$ is near $f$ (think of a discrete averaging where $\phi$ is a "weight", and the convolution at a point is the average of the function according to that weight centralized on the point. If we put more weight at the point, and less on its near values, then the average shouldn't shake things that much). This is easy to arrange for $C^{\infty}$ functions, since we have bump functions. But here we would like to prove density of polynomials, which are not so malleable. With effect, we can't have a polynomial with compact support and integrating $1$ (we can't have non-trivial polynomials with compact support at all!).
So we try to emulate a compact support by taking a polynomial which is highly concentrated near the origin. The polynomial $p(x)=1-x^2$ is maybe the most trivial example of something concentrated near the origin. An important observation is that since the function is defined on $[0,1]$, what matters to our avering polynomial is its behaviour in $[-1,1]$, so the fact that $(1-x^2)$ explodes outside of $[-1,1]$ is irrelevant. Then we raise it to the power of $n$ since, being on $(-1,1)$, this will concentrate things even more near the origin (you can plot a few powers to try and see this). The estimates that Rudin do are to guarantee that everything goes well and represent the technical difficulty of not having compact support (also as small as desired).
The polynomial $1-x^4$ also satisfies the property that if we raise it to $n$, things will concentrate near the origin. The only difference is that it is less concentrated than $1-x^2$, so that the error you make with the convolution will probably be bigger than if you used $1-x^2$.
Best Answer
A trivial verification shows that the result is true when $f(x)=cx^{k}$ for some non-negative integer $k$ and $c$ is a constant. Hence it holds when $f$ is a polynomial. Now choose a polynomial $p$ such that $|f(x)-p(x)| <\epsilon$ for all $x$ (which is possible by Weierstrass Approximation Theorem). Note that $|\frac {\int x^{n}f(x)} {\int x^{n}dx}-\frac {\int x^{n}p(x)} {\int x^{n}dx}|<\epsilon$. Can you finish?
Stone-Weierstrass Theorem is much stronger than Weierstrass Approximation Theorem. For this problem you only need Weierstrass Approximation Theorem.