In general, if you want to approximate $F$ to the first order around some point $u^k$, Taylor's formula says
$$F(u) = F(u^k) + \Bbb d F (u^k) (u - u^k) + \frac 1 2 \Bbb d ^2 F (v) (u - u^k, u - u^k) ,$$
with $v$ some point on the line segment of endpoints $u$ and $u^k$. As you see, there are three terms showing up; let us analyze them one by one.
The first one is clear: just replace $u$ by $u^k$ and you're done.
In the second, you have to compute $\Bbb d (Au-f) (u^k)$, the derivative in the point $u^k$ of the function
$$u \mapsto \|Au - f \|^2 = \langle Au-f, Au-f \rangle .$$
This is a (scalar) product, so derive it like a product according to Leibniz's formula (yes, it is correct!). Remember that the differential of a linear map, applied in some point, is that linear map itself, so
$$\Bbb d (Au-f) (u^k) = A ,$$
(because $\Bbb d f (u^k) = 0$, $f$ being a constant vector with respect to $u$). Next, you have to apply this $\Bbb d (Au-f) (u^k)$ to the vector $u-u^k$, therefore getting $A (u-u^k)$. Putting everything together,
$$\Bbb d (Au-f) (u^k) (u-u^k) = A(u-u^k)$$
and the second term in the above Taylor expansion becomes
$$\langle A(u-u^k), A u^k - f \rangle + \langle A u^k - f, A(u-u^k) \rangle = 2 \langle A(u-u^k), A u^k - f \rangle$$
because the scalar product is symmetric. Now remember that, in general, $\langle Au, v \rangle = \langle u, A^T v \rangle$ (in fact, this is the definition itself of the transposition operation), therefore the above becomes equal to
$$2 \langle u-u^k, A^T (A u^k - f) \rangle = 2 \langle u, A^T (A u^k - f) \rangle \color{red} {- 2 \langle u^k, A^T (A u^k - f) \rangle} .$$
The first of the above two terms is exactly the second one in your formula. The one in red will be absorbed, at the end, in $\frac 1 \delta$.
Finally, the third term can be rewritten as
$$\frac 1 2 \Bbb d ^2 F (v) \Big( \frac {u - u^k} {\| u - u^k \|}, \frac {u - u^k} {\| u - u^k \|} \Big) \cdot \| u - u^k \|^2$$
because $d ^2 F (v)$ is linear in each argument. Remember that, in general, if $G(x) = Ax : \Bbb R ^n \to \Bbb R $ is linear, and $p,u,v \in \Bbb R ^n$, then $\Bbb d ^2 G (p) (u,v) = u A v^T$ (where I take vectors to be rows, thus their transposed are columns). Since $F(u) = Au-f$ (and $f$ is constant with respect to $u$) then
$$\Bbb d ^2 (Au-f) (u^k) \Big( \frac {u - u^k} {\| u - u^k \|}, \frac {u - u^k} {\| u - u^k \|} \Big) = \frac {u - u^k} {\| u - u^k \|} \cdot A \cdot \Big( \frac {u - u^k} {\| u - u^k \|} \Big) ^T .$$
Collecting everything, Taylor's formula looks like
$$\|Au-f\|^2 = \| A u^k -f \|^2 + 2 \langle u, A^T (A u^k - f) \rangle + \Bigg( \color{blue} {\frac 1 2 \frac {u - u^k} {\| u - u^k \|} \cdot A \cdot \Big( \frac {u - u^k} {\| u - u^k \|} \Big) ^T} \color{red} {- 2 \frac {\langle u^k, A^T (A u^k - f) \rangle} {\| u - u^k \|^2} } \Bigg) \| u - u^k \|^2 .$$
Now the authors note (in the paragraph between formulae (2.3) and (2.4)) that for large enough $k$ (i.e. after sufficiently many iterations), the quantity $\| A u^k - f \|$ becomes small enough, so $A u^k - f$ becomes small enough (as a vector), so that it and the term in red that contains it may be ignored. Furthermore, for $u$ sufficiently close to $u^k$ you may approximate $\frac {u - u^k} {\| u - u^k \|}$ by some fixed vector $v_k$ (on the unit sphere, but this is not important), so take $\delta = \frac 2 {v_k A v_k ^T}$, insert this back into the above formula and you're done. Note that all the approximations made in this last paragraph have transformed the mathematically correct equality in Taylor's formula into an approximate equality, which explains why the authors switch from $=$ to $\approx$.
Best Answer
For about the same reason that you don't write the decimal expansion of rational numbers in full: that takes an infinite number of digits.
The so-called transcendent functions usually take an infinite number of arithmetic operations to be evaluated exactly, whatever the approach.
And functions are also often defined from approximations that can be refined at will by increasing the number of terms (we say that these approximations converge to the function).
For instance, the series
$$1+x+\frac{x^2}2+\frac{x^3}{2\cdot3}+\frac{x^4}{2\cdot3\cdot4}+\cdots$$ is a way to define the exponential function $e^x$, if you consider an infinity of terms.
A secondary argument is that mathematicians often need to discuss the properties of the functions by replacing them with similar ones for which suitable properties are already known.
For the sake of the example, the exponential can be bounded by a crude linear approximation,
$$e^x\ge 1+x$$ and this is enough to prove that the value of the exponential can be as large as you want.