There's a short-cut in Roth's approach if one only cares to get $o(N)$. Adolf Hildebrand told me so, and here is my shortest writeup.
Notation: Let $r(N),\rho(N)$ be the largest cardinality and density of a subset of $[N]$ that is free of 3-term APs, and let $\rho=\lim \rho(N)$, which must exist by Fekete's subadditive lemma since $r(N+M)\leq r(N)+r(M)$. Let $A\subset [N]$ witness $r(N)$, and let $S(x) = \sum_{a\in A} e( a x)$ (where $e(ax)=\exp(2\pi i a x)$, naturally). Let $T(x)=\sum_{n=1}^N e(nx)$.
Lemmata: Now $r(N)=|A|=I:=\int_0^1 S(x)^2 S(-2x)dx$, since $A$ is 3-free. By Parseval, $|A|=\int_0^1 |S(x)|^2dx$. By expanding $S(x)^2 T(-2x)$ and exchanging integration and summation, $(|A|/2)^2 \leq I_0:=\int_0^1 S(x)^2 T(-2x)$.
Main Engine: As long as $0< M\leq N$, $$\sup_{x\in {\mathbb R}} |S(x)-\rho(M) T(x)| \leq N(\rho(M)-\rho(N)) + 10 M \sqrt{N}.$$ Proof is by circle method; set $E(x):=S(x)-\rho(M)T(x)$. Three steps are
- $|E(a/q)|\leq N(\rho(M)-\rho(N))+2Mq$, the hard one;
- $|E(a/q+\beta)| \leq N(\rho(M)-\rho(N))+2Mq +2\pi|\beta|NMq$;
- $|E(x)| \leq N(\rho(M)-\rho(N))+10M\sqrt{N}$.
Main Lemma: By Main Engine and Lemmata, $\Delta:=|I-\rho(M)I_0|$ is at most $(N(\rho(M)-\rho(N))+10M\sqrt{N})|A|$. By Lemmata, $\Delta$ is at least $|A|(\rho(M)\rho(N)N/4-1)$. Therefore, $$\rho(N)\rho(M)\leq 4(\rho(M)-\rho(N))+50M/\sqrt{N}.$$
Conclusion: Let $N\to\infty$ and then $M\to\infty$ to get $\rho^2\leq 0$.
The step labeled "the hard one" in the Main Engine is tricky, and uses the fact that we can restrict $A$ to arithmetic progressions and still get a 3-free set. The length of the progressions we restrict to is connected to $M$.
I suppose the point is that a write-up can be almost arbitrarily short or extremely long, depending on what the reader can be trusted to fill in.
Very nice question! This is a (long) comment, not an answer
I found a heuristic suggesting the lower bound is sharp, and wonder if you have some reason to doubt it. The model is that given $n$ and $a_k$, $a_{k+1}$ is uniformly chosen from the set $\{0,\ldots,a_k-1\}$. Alternatively, $a_{k+1}=\lfloor Ua_k\rfloor$, where $U$ is a uniform random variable. So you're asking starting from somewhere in the range 1 to $a$, how many $U$'s do you have to multiply before you get 0 (integer part). You can check quickly that the probability you don't get 0 after multiplying $100\log a$ terms is $O(a^{-K})$ for some reasonably large power.
So even accounting for the a different values of $b$, the probability that any of them lead to a chain of length 100$\log a$, is $O(a^{1-K})$. Since this is summable, by Borel-Cantelli, there is $a_0$ such that for all $a\ge a_0$, there's no chain of length $100\log a$.
Best Answer
Dear Yui,
It's only slightly more than a casual remark. Our inability to find a better example is certainly a big reason for believing that Behrend's bound is correct. Julia Wolf and I slightly rehashed the proof of Behrend's bound
http://arxiv.org/abs/0810.0732
When formulated this way, I think the construction looks both fairly natural and fairly unimprovable.
Also, there are beginning to be hints as to the correct behaviour coming from apparently similar equations such as $x_1 + x_2 + x_3 = 3x_4$. I think that Schoen and others, using work of Sanders, may have improved the bounds for this equation to $N \exp(-\log^c N)$, though I'm not certain about this.
Despite these remarks it is not known whether better bounds for Roth's theorem follow from any other more "natural" conjectures, such as the Polynomial Freiman-Ruzsa conjecture, so any suggestion that Behrend is sharp is somewhat tenuous. Some other people think differently, I believe - that it may not be sharp.