[Math] Motivation of Weierstrass-approximation Theorem

real-analysis

Weierstrass Theorem (Classical):

If $f$ is a continuous real or complex function on $[a,b]$, there exists a sequence of polynomials $P_n$ such thay $\lim_{n\to \infty} P_n(x)=f(x)$.

The proof i know (using Berstein Polynomials) is easy but really artificial. I don't see any motivation and how and where ideas come in.

What are motivations for this proof?

And is weierstrass' original proof not constructive?

Best Answer

In order to see why we use Bernstein polynomials, first note that what we essentially want to do is approximate the identity with polynomials, this will be easier to understand if you are familiar with convolutions, and approximations of the Dirac delta $\displaystyle\delta(x)$. We want to do a sort of convolution, or weighted average, something like $(p_{n,k}\ast f)(x)=\sum_{j=0}^np_{n,j}(x-y_j)f(y_j)$ of a sequence of polynomials $p_{n,k}$ of degree $n$ with a continuous function $f$, let's say on $[0,1]$, such that $p_{n,k}\ast f\to f$ as $n\to\infty$.

Being a weighted average, we should have $(p_{n,k}\ast 1)(x)=\sum_{j=0}^n p_{n,k}(y)=1$. So we want to write $1$ as a sum of polynomials of degree $n$. The easiest way to do this is to notice that $1=1-x+x$, hence $\displaystyle 1^n=(1-x+x)^n=\sum_{k=0}^n {n\choose k}(1-x)^{n-k}x^k$. Now we want to insert $f(x)$ in here somehow so that when $n$ is large, $(p_{n,k}\ast f)(x)$ will be approximately equal to $f(x)$. Now if you plot $\displaystyle{n\choose k}(1-x)^{n-k}x^k$, you will see that it is largest when $\frac{k}{n}\approx x$, and for large $n$ $\displaystyle{n\choose k}(1-x)^{n-k}x^k\approx 0$ when $\frac{k}{n}$ is far from $x$. So if we put $\displaystyle(p_{n,k}\ast f)(x)=\sum_{k=0}^n {n\choose k}(1-x)^{n-k}x^k f\Big(\frac{k}{n}\Big)$, then for large $n$ the summands are $\approx 0$ when $\frac{k}{n}$ is far from $x$, so they hardly contribute to the sum, hence for large $n$ $\displaystyle(p_{n,k}\ast f)(x)=\sum_{k=0}^n {n\choose k}(1-x)^{n-k}x^k f\Big(\frac{k}{n}\Big)\approx f(x)\sum_{k=0}^n {n\choose k}(1-x)^{n-k}x^k=f(x)$.

To elaborate on why, for large $n$, $\displaystyle{n\choose k}(1-x)^{n-k}x^k$ is largest when $\frac{k}{n}\approx x$, you may want to read about the random walk if you don't already know it, but I will explain. Let's say you are dizzy, or in some confused state (it's usually said that you are drunk) and you are taking steps left and right, with a probability $x$ of going to the right, $1-x$ of going to the left, with each step. If you take a total of $n$ steps, then the probability of $k$ of those being to the right is $\displaystyle{n\choose k}(1-x)^{n-k}x^k$. Now $\frac{k}{n}$ is just the ratio of steps taken to the right to the number of steps taken in total. Now the probability of taking $k$ steps to the right is going to be highest when the ratio of steps taken to the right to the total number of steps taken is just equal to the probability of taking a step to the right; this is because you should expect that the number of steps taken to the right compared to the total number of steps taken is just the probability of going to the right, ie. when $\frac{k}{n}\approx x$. When the ratio of steps taken to the right to the total number of steps is far from the probability of going right, then $\displaystyle{n\choose k}(1-x)^{n-k}x^k$ will be $\approx 0$. And when you increase $n$ to a very large number, it becomes increasingly unlikely that you deviate from what should happen, this is the content of the law of large numbers, so these approximations get better and better.