In 1905, Lebesgue gave a "proof" of the following theorem:
If $f:\mathbb{R}^2\to\mathbb{R}$ is a Baire function such that for every $x$, there is a unique $y$ such that $f(x,y)=0$, then the thus implicitly defined function is Baire.
He made use of the "trivial fact" that the projection of a Borel set is a Borel set. This turns out to be wrong, but the result is still true. Souslin spotted the mistake, and named continuous images of Borel sets analytic sets. So a mistake of Lebesgue led to the rich theory of analytic sets. Lebesgue
seemingly enjoyed this fact and mentioned it in the foreword to a book, "Leçons sur les Ensembles Analytiques et leurs Applications", of Souslin's teacher Lusin (as referenced in an AMS review of the book).
There's a short-cut in Roth's approach if one only cares to get $o(N)$. Adolf Hildebrand told me so, and here is my shortest writeup.
Notation: Let $r(N),\rho(N)$ be the largest cardinality and density of a subset of $[N]$ that is free of 3-term APs, and let $\rho=\lim \rho(N)$, which must exist by Fekete's subadditive lemma since $r(N+M)\leq r(N)+r(M)$. Let $A\subset [N]$ witness $r(N)$, and let $S(x) = \sum_{a\in A} e( a x)$ (where $e(ax)=\exp(2\pi i a x)$, naturally). Let $T(x)=\sum_{n=1}^N e(nx)$.
Lemmata: Now $r(N)=|A|=I:=\int_0^1 S(x)^2 S(-2x)dx$, since $A$ is 3-free. By Parseval, $|A|=\int_0^1 |S(x)|^2dx$. By expanding $S(x)^2 T(-2x)$ and exchanging integration and summation, $(|A|/2)^2 \leq I_0:=\int_0^1 S(x)^2 T(-2x)$.
Main Engine: As long as $0< M\leq N$, $$\sup_{x\in {\mathbb R}} |S(x)-\rho(M) T(x)| \leq N(\rho(M)-\rho(N)) + 10 M \sqrt{N}.$$ Proof is by circle method; set $E(x):=S(x)-\rho(M)T(x)$. Three steps are
- $|E(a/q)|\leq N(\rho(M)-\rho(N))+2Mq$, the hard one;
- $|E(a/q+\beta)| \leq N(\rho(M)-\rho(N))+2Mq +2\pi|\beta|NMq$;
- $|E(x)| \leq N(\rho(M)-\rho(N))+10M\sqrt{N}$.
Main Lemma: By Main Engine and Lemmata, $\Delta:=|I-\rho(M)I_0|$ is at most $(N(\rho(M)-\rho(N))+10M\sqrt{N})|A|$. By Lemmata, $\Delta$ is at least $|A|(\rho(M)\rho(N)N/4-1)$. Therefore, $$\rho(N)\rho(M)\leq 4(\rho(M)-\rho(N))+50M/\sqrt{N}.$$
Conclusion: Let $N\to\infty$ and then $M\to\infty$ to get $\rho^2\leq 0$.
The step labeled "the hard one" in the Main Engine is tricky, and uses the fact that we can restrict $A$ to arithmetic progressions and still get a 3-free set. The length of the progressions we restrict to is connected to $M$.
I suppose the point is that a write-up can be almost arbitrarily short or extremely long, depending on what the reader can be trusted to fill in.
Best Answer
Hermann Weyl's delightful proof that for irrational $\alpha$ the sequence of values $k\alpha$ mod $1$, $k \in {\bf N}$, is uniformly distributed in $[0,1]$ deserves a mention. It's so simple I can summarize it here. First we check that for any nonzero $n \in {\bf Z}$ we have $$\frac{1 + e^{2\pi i n\alpha} + \cdots + e^{2\pi in(k-1)\alpha}}{k} \to 0$$ as $k \to \infty$. This is just a simple computation since the numerator is a geometric series. For $n = 0$ the displayed fraction reduces to $\frac{k}{k} = 1$. Since $\int_0^1 e^{2\pi i nx} dx = 1$ or $0$ depending on whether $n = 0$ or $n \neq 0$, it follows that $$\frac{1}{k}\sum_{j=0}^{k-1} e^{2\pi i nj\alpha} \to \int_0^1 e^{2\pi inx} dx$$ for all $n \in {\bf Z}$. Setting $x_j = j\alpha$ mod $1$ and taking linear combinations then yields $$\frac{1}{k}\sum_{j=0}^{k=1} f(x_j) \to \int_0^1 f(x) dx$$ for any trigonometric polynomial $f$, and by straightforward approximation arguments we get the same conclusion, first for any continuous function $f$ on $[0,1]$ and then for $f = \chi_{[a,b]}$. But with this $f$ the left side becomes the fraction of values $j\alpha$ mod $1$ for $0 \leq j \leq k-1$ which lie in $[a,b]$ and the right side becomes $b-a$, so this is just the statement of uniform distribution.