[Math] Understanding the Stone-Weierstrass Theorem in Rudin’s Principle of Mathematical Analysis

analysisreal-analysisuniform-convergenceweierstrass-approximation

Below is the statement and proof of the Stone-Weierstrass Theorem in Rudin's Principle of Mathematical Analysis (page 159, Chapter 7.)

Question: Suppose Rudin uses, $Q_n=c_n(1-x^4)^n$ instead of $Q_n=c_n(1-x^2)^n$, how would the proof of the theorem change?


My thoughts: My thoughts are that the proof changes in (48), where we use the lower bound $1/\sqrt{n}$. I think if we use $Q_n=c_n(1-x^4)^n$, our estimate would have to be different. Thus, for (50), $Q_n$ would be less than something different than $\sqrt{n}$.

However, I'm not sure if this correct. I'm feeling as if though the proof would have bigger changes if we used $Q_n=c_n(1-x^4)^n$ as my professor gave me this question to contemplate about.


enter image description here
enter image description here
enter image description here

Best Answer

It wouldn't change much, since $$(1-x^4)^n \geq 1-nx^4$$ by the same reason (ie, the derivative of $f(x)=(1-x^4)^n-1+nx^4$ is $-4nx^3(1-x^4)^{n-1}+4nx^3$, which is positive on $(0,1)$) and then you can integrate up to $1/\sqrt[4]{n}$ and get a similar analysis for $c_n$. Then, the only times that we see $(1-x^2)^n$ appearing again is in justifying the uniform convergence of $Q_n$ outside any small interval around $0$, which will hold by essentially the same argument, and in the last inequality, which will not make a difference.

However, there is more to it than "it wouldn't change anything". The idea of the proof by Rudin is that the convolution of two functions should preserve the best properties out of the two functions. One good example of this is the convolution with a smooth compactly supported function, which is a good way to prove uniform density of the smooth functions on $C^0(I)$, for example. So he strikes for a convolution with a polynomial.

The point is: if $f$ is a function (continuous, $L^1_{loc}$ or whatever depending on context), we expect that after making a convolution with a function $\phi$ which is normalized as to have $\int \phi =1$, if the support is small enough and concentrated near $0$, then the convolution $\widetilde{f}$ is near $f$ (think of a discrete averaging where $\phi$ is a "weight", and the convolution at a point is the average of the function according to that weight centralized on the point. If we put more weight at the point, and less on its near values, then the average shouldn't shake things that much). This is easy to arrange for $C^{\infty}$ functions, since we have bump functions. But here we would like to prove density of polynomials, which are not so malleable. With effect, we can't have a polynomial with compact support and integrating $1$ (we can't have non-trivial polynomials with compact support at all!).

So we try to emulate a compact support by taking a polynomial which is highly concentrated near the origin. The polynomial $p(x)=1-x^2$ is maybe the most trivial example of something concentrated near the origin. An important observation is that since the function is defined on $[0,1]$, what matters to our avering polynomial is its behaviour in $[-1,1]$, so the fact that $(1-x^2)$ explodes outside of $[-1,1]$ is irrelevant. Then we raise it to the power of $n$ since, being on $(-1,1)$, this will concentrate things even more near the origin (you can plot a few powers to try and see this). The estimates that Rudin do are to guarantee that everything goes well and represent the technical difficulty of not having compact support (also as small as desired).

The polynomial $1-x^4$ also satisfies the property that if we raise it to $n$, things will concentrate near the origin. The only difference is that it is less concentrated than $1-x^2$, so that the error you make with the convolution will probably be bigger than if you used $1-x^2$.