The discrepancy is caused by taking the maximal derivative in the interval [2.5, 3.0]:
$$k = \max |g'(x)| = |g'(2.5)| = \sqrt{10}/5 = 0.632$$
So, you assume that the solution error is decreased by $0.632$ at each iteration step, and you would need at least 18 (I got 21) iterations to bring the error to $0.0001$.
However, the derivative is much smaller in the neighborhood of the solution:
$$k = |g'(\text{solution})| = |g'(2.924)| = 0.5$$
If you estimate with this $k$, the error is halved at each iteration, and you need only 14 iterations to decrease it to $0.00008$. This is too optimistic, because at the beginning of the iteration you should use $k = 0.623$ and only later move to $k=0.5$. But this analysis should explain the discrepancy between the theoretical estimation and the numerical iteration.
Let $B * y$ denote matrix multiplication performed in finite precision arithmetics. Let $x_{LU}$ be the solution of $A x = b$ obtained by the LU method, and $x_{inv}$ be the solution computed as $x_{inv} = \text{inv}(A) * b$. Let $\text{inv}(A)$ denote the inverse of $A$ computed from the LU decomposition $A \approx \hat{L}\hat{U}$ by solving $Ax_j = e_j$. Note that $\text{inv}(A)$ is different from $\hat{U}^{-1} * \hat{L}^{-1}$ but is much simpler to analyse.
The forward error for $x_{LU}$ is
$$\tag{1}\|x - x_{LU}\|_\infty \leq c_n u \||A^{-1}| |\hat{L}| |\hat{U}| |x_{LU}| \|_\infty \leq c_n u \||A^{-1}| |\hat{L}| |\hat{U}|\|_\infty \times \|x_{LU} \|_\infty$$
where $c_n$ denotes a constant of order $n$ and $u$ is the machine precision.
Similar forward error bounds for $x_{inv}$ is
$$\tag{2} \|x - x_{inv}\|_{\infty} \leq c_n u \||A^{-1}| |\hat{L}| |\hat{U}| |\text{inv}(A)| |b|\|_{\infty} \leq c_n u \||A^{-1}| |\hat{L}| |\hat{U}|\|_\infty \times \|\text{inv}(A)| |b|\|_{\infty}$$
Comparing (1) and (2) and noting, that for sufficiently well conditioned matrix $A$,
$$\tag{3} \|x_{LU}\|_\infty\approx \|x\|_\infty = \|A^{-1}b\|_\infty \leq \||A^{-1}||b|\|_\infty\approx \||\text{inv}(A)| |b|\|_\infty$$
we see, that as long as $\|x\|_\infty\approx \|A^{-1}||b\|_\infty$, then $x_{LU}$ and $x_{inv}$ have similar forward error. Notice, that this condition holds for general $b$.
Therefore forward error bounds (1) and (2) always prefer $x_{LU}$ over $x_{inv}$ but usually the forward error for $x_{inv}$ is not much worse.
Error bound (1) can be found in Higham's "Accuracy and Stability of Numerical Algorithms". Error bound (2) can be computed from forward error bound for $\text{inv}(A)$:
$$|\text{inv}(A) - A^{-1}| \leq c_n u |A^{-1}| |\hat{L}| |\hat{U}| |\text{inv}(A)|$$
and from observation, that error introduces by matrix multiplication $\text{inv}(A) * b$ is always smaller than error associated with computing $\text{inv}(A)$.
Now suppose, that $\bar{x}_{inv} = A^{-1} *b$, i.e. then inverse $A^{-1}$ is computed exactly and the only source of error is matrix multiplication. Then residuals $b - A\bar{x}_{inv}$ satisfy
$$\tag{4} |b - A \bar{x}_{inv}| \leq c_n u |A| |A^{-1}| |b|$$
for $x_{LU}$ we have
$$\tag{5}|b - A x_{LU}| \leq c_n u |\hat{L}| |\hat{U}| |x_{LU}|$$
Since it is usually true, that $\||\hat{L}| |\hat{U}|\|_\infty \approx \|A\|_\infty$, therefore $\bar{x}_{inv}$ and $x_{LU}$ can have comparable residual norms only when $\|x_{LU}\|_\infty \approx \||A^{-1}| |b|\|_\infty$. On the other hand, from (3) we have (for sufficiently well conditioned $A$), $\|x_{LU}\|_\infty \leq \||A^{-1}| |b|\|_\infty$, therefore residuals for $x_{LU}$ are usually smaller than residuals for $x_{inv}$ even for well conditioned matrix $A$. If $A$ has large condition number, then $\text{inv}(A)$ is inaccurate, we have additional source of error in (4), therefore we can expect much larger residuals for $x_{inv}$.
Therefore:
Presented $2\times 2$ example is generic, i.e. usually $x_{inv}$ and $x_{LU}$ have similar forward error (even for ill conditioned matrix $A$) unless $|x|_\infty \ll \||A^{-1}| |b|\|_\infty$.
Residuals for $x_{inv}$ is usually bigger than for $x_{LU}$ unless $A$ is well conditioned and $|x|_\infty \approx \||A^{-1}| |b|\|_\infty$.
I am not aware of any practical applications where minimizing the backward error is the ultimate goal. However the backward error and the forward error are closely related according to inequality
$$\text{forward_error} \leq \text{condition_number} \times \text{backward_error}$$
Since condition number does not depend on algorithm used to solve given problem, one may hope, that choosing an algorithm will smaller backward error will lead to lower forward error as well.
Best Answer
As long as your computer can hold the exact coefficients of the polynomial, you will be able to get some sort of stability in calculating the roots, even with such a crude method as interval bisection. However, this isn't the point of Wilkinson's polynomial. Rather, what the polynomial demonstrates is that a tiny error in the data for a problem, namely in the coefficients of the polynomial, can lead to a big error in the solution values (the roots). A consequence of this, for example, would be big errors in obtaining the eigenvalues of a matrix by calculating the roots of its characteristic polynomial, where the matrix entries may not be exact even though specified to pretty good precision.
Wilkinson's polynomial is ill conditioned because of the huge ratio of the error in the solution to an error in the data. The ratio you cite is another matter.