These definitions are not really conflicting, the first one is technically imprecise however.
The proper definition would be the "differentiation" definition.
In the first case, $\delta$ isn't what maps $F[\rho]$ to $F[\rho+\delta\rho]-F[\rho]$, but what maps $F[\rho]$ to $F[\rho+\delta\rho]-F[\rho]$ with second-order and higher terms neglected in the Taylor-expansion of $F[\rho+\delta\rho]$. A differentiation is essentially taking the first order coefficient of a Taylor-expansion, so technically these two are the same.
Of course in old-school physics literature, they will make it sound the Taylor-expansion definition is just some sort of "approximation", while the "differentiation" definition seems exact, but both essentially give the linear approximation to the change in the functional in the direction of the function $\delta\rho$, which is what differentiation does.
Partial answer:
First things first: High praise for the beautifully formatted question!
To go from 1 to 2, note that equation 1 looks like (as a function of $\epsilon$ only)
$$
g(\epsilon) = C + \epsilon D
$$
where $C$ and $D$ are expressions not involving $\epsilon$. The derivative of $g(\epsilon)$ with respect to $\epsilon$ is evidently $D$.
As for "what does $\frac{d}{d\varepsilon}F(u + \varepsilon\phi) \big|_{\epsilon = 0}$ mean?", it means exactly the computation I just did above.
In general $\frac{d}{dz} H(z) \big|_{z = c}$ means "take the derivative of $H$ with respect to $z$, at the point $z = c$," which could also be written $H'(z)$. When $H$ is given as a more complex expression, rather than having a single name, i.e., when instead of $H$ we have $F(k + z p)$, for instance, there's no easy "prime" notation to express this (especially when $k$ and $p$ are not real numbers, but something more complicated like functions).
One other answer to that question is an explicit formula: it means
$$
\lim_{h \to 0} \frac{F(u + (0+h)\cdot\phi) - F(u + 0 \cdot\phi)}{h},
$$
pretty much the same as any other derivative you've ever seen.
To go from 3 to 4, use integration by parts (with upper and lower limits), namely
$$
\int_a^b F(x)G'(x) dx = F(x)G(x)\big|_a^b - \int_a^b F'(x) G(x) dx
$$
in this case applying the integration-by-parts only to the SECOND integral in (4), and picking
$$
F(x) = f_{u'}(x,u(x),u'(x))\\
G(x) = \phi(x)
$$
The right-hand side then becomes a sum of the first integral, an evaluation from $a$ to $b$, and another integral, and the authors have combined the two integrals.
To answer the third bullet, I had to go back and look at the text. In the paragraph just after the word "Proof" , I believe the sentence starting "For a given function ..." should begin a new paragraph, or indeed, a new section, in which the lemma just proved will be applied.
When you get to equation (2), you have exactly the situation listed in the Lemma: some function (in this case a complicated mess) times $\phi'$ integrates to $0$ for all possible $\phi$. From that, you can conclude directly the the complicated mess must be everywhere zero (which is exactly what the top line of claim (5) says).
(The remaining integration by parts tells you some other stuff, I guess, I suppose the other two lines of claim 5, but I didn't check this.)
Best Answer
In fact the mess of definitions of the variation in literature is unmatched by anything I've seen thus far.
Anyway I believe you cannot multiply by $\epsilon$ since it is a differentially small element. Also you define the Gateaux derivative below there. For me it is more clear to look at a general family of functions and use the general variation (without the affine approach $y=\gamma+h \epsilon$)!
But as far as I can tell, what confuses you is the same thing I strugled with having seen many definitions: a common approach is to define $$ \delta q := \lim_{\epsilon\to\epsilon_0} \cfrac{\tilde{q}(\epsilon)-\tilde{q}(\epsilon_0)}{\epsilon-\epsilon_0} $$ which in terms of a function derivative $$ \frac{dy}{dx} := \lim_{x\to x_0} \cfrac{y(x)-y(x_0)}{x-x_0} $$ is lacking the term $$ \delta \epsilon \;.$$
This is the case for the wiki definition (the Gateaux derivative). Your definition above however explicitly deals with this quantity. In the end it's a matter of definition. Another method I know includes the variation of the parameter in the definition (so it never has to be written down): $$ \delta q := \left. \frac{\partial Q(\epsilon)}{\partial \epsilon}\right|_{\epsilon=\epsilon_0} \delta\epsilon $$ where $Q$ is the family of testing functions. I hope this helps somewhat.