Let $B * y$ denote matrix multiplication performed in finite precision arithmetics. Let $x_{LU}$ be the solution of $A x = b$ obtained by the LU method, and $x_{inv}$ be the solution computed as $x_{inv} = \text{inv}(A) * b$. Let $\text{inv}(A)$ denote the inverse of $A$ computed from the LU decomposition $A \approx \hat{L}\hat{U}$ by solving $Ax_j = e_j$. Note that $\text{inv}(A)$ is different from $\hat{U}^{-1} * \hat{L}^{-1}$ but is much simpler to analyse.
The forward error for $x_{LU}$ is
$$\tag{1}\|x - x_{LU}\|_\infty \leq c_n u \||A^{-1}| |\hat{L}| |\hat{U}| |x_{LU}| \|_\infty \leq c_n u \||A^{-1}| |\hat{L}| |\hat{U}|\|_\infty \times \|x_{LU} \|_\infty$$
where $c_n$ denotes a constant of order $n$ and $u$ is the machine precision.
Similar forward error bounds for $x_{inv}$ is
$$\tag{2} \|x - x_{inv}\|_{\infty} \leq c_n u \||A^{-1}| |\hat{L}| |\hat{U}| |\text{inv}(A)| |b|\|_{\infty} \leq c_n u \||A^{-1}| |\hat{L}| |\hat{U}|\|_\infty \times \|\text{inv}(A)| |b|\|_{\infty}$$
Comparing (1) and (2) and noting, that for sufficiently well conditioned matrix $A$,
$$\tag{3} \|x_{LU}\|_\infty\approx \|x\|_\infty = \|A^{-1}b\|_\infty \leq \||A^{-1}||b|\|_\infty\approx \||\text{inv}(A)| |b|\|_\infty$$
we see, that as long as $\|x\|_\infty\approx \|A^{-1}||b\|_\infty$, then $x_{LU}$ and $x_{inv}$ have similar forward error. Notice, that this condition holds for general $b$.
Therefore forward error bounds (1) and (2) always prefer $x_{LU}$ over $x_{inv}$ but usually the forward error for $x_{inv}$ is not much worse.
Error bound (1) can be found in Higham's "Accuracy and Stability of Numerical Algorithms". Error bound (2) can be computed from forward error bound for $\text{inv}(A)$:
$$|\text{inv}(A) - A^{-1}| \leq c_n u |A^{-1}| |\hat{L}| |\hat{U}| |\text{inv}(A)|$$
and from observation, that error introduces by matrix multiplication $\text{inv}(A) * b$ is always smaller than error associated with computing $\text{inv}(A)$.
Now suppose, that $\bar{x}_{inv} = A^{-1} *b$, i.e. then inverse $A^{-1}$ is computed exactly and the only source of error is matrix multiplication. Then residuals $b - A\bar{x}_{inv}$ satisfy
$$\tag{4} |b - A \bar{x}_{inv}| \leq c_n u |A| |A^{-1}| |b|$$
for $x_{LU}$ we have
$$\tag{5}|b - A x_{LU}| \leq c_n u |\hat{L}| |\hat{U}| |x_{LU}|$$
Since it is usually true, that $\||\hat{L}| |\hat{U}|\|_\infty \approx \|A\|_\infty$, therefore $\bar{x}_{inv}$ and $x_{LU}$ can have comparable residual norms only when $\|x_{LU}\|_\infty \approx \||A^{-1}| |b|\|_\infty$. On the other hand, from (3) we have (for sufficiently well conditioned $A$), $\|x_{LU}\|_\infty \leq \||A^{-1}| |b|\|_\infty$, therefore residuals for $x_{LU}$ are usually smaller than residuals for $x_{inv}$ even for well conditioned matrix $A$. If $A$ has large condition number, then $\text{inv}(A)$ is inaccurate, we have additional source of error in (4), therefore we can expect much larger residuals for $x_{inv}$.
Therefore:
Presented $2\times 2$ example is generic, i.e. usually $x_{inv}$ and $x_{LU}$ have similar forward error (even for ill conditioned matrix $A$) unless $|x|_\infty \ll \||A^{-1}| |b|\|_\infty$.
Residuals for $x_{inv}$ is usually bigger than for $x_{LU}$ unless $A$ is well conditioned and $|x|_\infty \approx \||A^{-1}| |b|\|_\infty$.
I am not aware of any practical applications where minimizing the backward error is the ultimate goal. However the backward error and the forward error are closely related according to inequality
$$\text{forward_error} \leq \text{condition_number} \times \text{backward_error}$$
Since condition number does not depend on algorithm used to solve given problem, one may hope, that choosing an algorithm will smaller backward error will lead to lower forward error as well.
Indeed, the Skeel condition number can be characterized as
$$
k_S(A)=\min\{k_\infty(DA): D\text{ diagonal}\}.
$$
Three observations are useful here:
We can express the matrix $\infty$-norm as
$$
\|X\|_{\infty}=\||X|e\|_{\infty},
$$
where $e=[1,\ldots,1]^T$ of suitable size. Note that there's a vector norm on the right-hand side.
If $DA$ is the row-equilibrated matrix, we have $|DA|e=e$, where $e=[1,\ldots,1]^T$.
The Skeel condition number is independent of row scaling:
$k_S(DA)=k_S(A)$ for any nonsingular diagonal $D$.
This gives us
$$
\begin{split}
k_\infty(DA)
&=\||(DA)^{-1}|e\|_\infty\||DA|e\|_\infty
=\||(DA)^{-1}||DA|e\|_\infty
=k_S(DA)=k_S(A).
\end{split}
$$
Best Answer
As you said, this is very problem dependent and it depends on what kind of accuracy in what quantity you actually want.
Often, $A$ and $b$ contain some measured data (problem coefficients, loads, etc.). They have usually a stochastic nature due to the errors in measurements. What they found later was that instead of $\tilde{x}$ being the solution of $Ax=b$, it solved another problem, $\tilde{A}\tilde{x}=\tilde{b}$, with $\tilde{A}$ and $\tilde{b}$, respectively, very close to $A$ and $b$ in terms of the measurements errors. So instead of solving exactly the already inexactly constructed problem, they found that they solved a still-acceptable problem within the measurement error tolerances even though their computed solution could be considered as a garbage with respect to the original one. But when you have inexactly stated problem, does it have a meaning to attempt to solve it exactly (impossible) or to a very high level of accuracy?
This is the concept of the so called backward error where you seek for what data you actually have the exact solution. It is a very popular topic in numerical linear algebra and error analysis and was pioneered by von Neumann, Turing and further developed and popularised by Jim Wilkinson. This appeared to be an important tool for analysing the rounding errors in floating point computations since the backward error analysis allows you to obtain bounds on the forward errors ($x-\tilde{x}$ in some norm) in two separate parts: the algorithm-dependent part (provide bounds on the backward error) and the problem-dependent part (sensitivity of the problem, some sort of condition number).
If the backward error is small enough (by the order of $\epsilon_{\mathrm{mach}}$, you can say that you solved your problem to the greatest accuracy you could hope for on your computer. Why? Because even if you could somehow obtain "exact" values of the entries of your $A$ and $b$, the relative errors due to their storage in the finite precision arithmetic are of the order of the machine precision.
Now what exactly is the backward error in the context of the linear (nonsingular, for simplicity) systems? Assume you want to solve $Ax=b$ and obtain some $\tilde{x}=x+\delta x$. So you look for $\tilde{A}=A+\delta A$ and $\tilde{b}=b+\delta b$ such that $\tilde{A}\tilde{x}=\tilde{b}$, that is, your $\tilde{x}$ is the exact solution of a perturbed problem. Of course, we would like $\delta A$ and $\delta b$ to be as close as possible to the original data. So consider, e.g., the following formulation: $$ \eta(\tilde{x})=\min\{\tau:\;(A+\delta A)\tilde{x}=(b+\delta b),\;\|\delta A\|\leq\tau\|A\|,\;\|\delta b\|\leq\tau\|b\|\}. $$ It can be show that the formula for $\eta$ is actually very simple: $$ \eta(\tilde{x})=\frac{\|b-A\tilde{x}\|}{\|A\|\|\tilde{x}\|+\|b\|}. $$ (This result was obtained some time ago in a paper by Rigal and Gaches.) If you want to have a bound on the forward error, then the following holds: $$ \frac{\|x-\tilde{x}\|}{\|x\|}\leq\frac{\eta(\tilde{x})\kappa(A)}{1-\eta(\tilde{x})\kappa(A)}\left(\frac{\|\delta A\|}{\|A\|}+\frac{\|\delta b\|}{\|b\|}\right) $$ assuming $\eta(\tilde{x})\kappa(A)<1$. Now this bound looks very similar than the one in the question. Note however that the backward error $\eta(\tilde{x})$ can be much lower than what you have as the coefficient by the $\kappa(A)$ in your inequality, that is, the relative residual norm: $$ \eta(\tilde{x})\leq\frac{\|b-A\tilde{x}\|}{\|b\|}. $$ At some point, depending on $A$ and $\tilde{x}$, the backward error $\eta(\tilde{x})$ could be even much lower than the right-hand side in the inequality above. Note that the inequality $\eta(\tilde{x})\kappa(A)<1$ guarantees you that the perturbed system matrix $\tilde{A}$ is nonsingular. In other words, if you'd like to be sure you solved some nearby meaningful problem, you'd like to have $\eta(\tilde{x})$ at least of the order of $1/\kappa(A)$. Also, starting from this point, decreasing $\eta$ by an order of magnitude gives you roughly an additional significant digit in your solution error.
Much more can be said about forward/backward errors, I invite you to have a look on the Higham's book if you want to know more details.
Anyway, if you want to solve $Ax=b$, you need to know that your solution won't be exact no matter what algorithm you use and you always need to ask yourself (or somebody else) what do you expect from your solution and in what terms you want to measure the accuracy. This is sometimes hard to assess even in terms of backward errors since in some cases the classical norms like 2-norm or $\infty$-norm used in rounding error analyses are not exactly what appears in the problem you want to solve.
The typical example of this issue is solving discretised partial differential equations. Often, the forward error norm $\|x-\tilde{x}\|$ nor the residual norm $\|b-A\tilde{x}\|$ have a meaningful role in this context. In practice, one is normally interested in the accuracy in terms of fluxes, that is, the gradients of the discrete solutions. It is not unusual that the errors in fluxes can be much lower than the forward errors in the actual $\tilde{x}$ which has usually the meaning of some coordinates of the discrete functional approximation. Instead of looking at the significant digits of $\tilde{x}$, you want to know whether the error you made by getting $\tilde{x}$ instead of $x$ is somewhat related to the discretisation errors in some proper norm (say, in terms of these gradients). Some sort of accuracy analysis of the algorithms in such a context is, however, still an open, and probably very hard to handle, problem.
Of course, if you really insist that you want to solve $Ax=b$ (even though the exactness of your $A$ and $b$ is at least questionable), you can use some approaches based on arbitrary precision arithmetic. However, this is practical only for some small problems and can be incredibly expensive for larger problems. On some conferences, I saw sometimes people that are working on that (even on, e.g., GPUs) but these kind of solvers are so costly that they're mostly useless for practice. Not mentioning that solving linear problems exactly is at least arguable in any context.