I think you've basically hit the nail on the head in the question, but I'll see if I can add something anyway. I'm going to answer this in a bit of a roundabout way ...
The field of Robust Statistics examines the question of what to do when the Gaussian assumption fails (in the sense that there are outliers):
it is often assumed that the data errors are normally distributed, at least approximately, or that the central limit theorem can be relied on to produce normally distributed estimates. Unfortunately, when there are outliers in the data, classical methods often have very poor performance
These have been applied in ML too, for example in Mika el al. (2001) A Mathematical Programming Approach to the Kernel Fisher Algorithm, they describe how Huber's Robust Loss can be used with KDFA (along with other loss functions). Of course this is a classification loss, but KFDA is closely related to the Relevance Vector Machine (see section 4 of the Mika paper).
As implied in the question, there is a close connection between loss functions and Bayesian error models (see here for a discussion).
However it tends to be the case that as soon as you start incorporating "funky" loss functions, optimisation becomes tough (note that this happens in the Bayesian world too). So in many cases people resort to standard loss functions that are easy to optimise, and instead do extra pre-processing to ensure that the data conforms to the model.
The other point that you mention is that the CLT only applies to samples that are IID. This is true, but then the assumptions (and the accompanying analysis) of most algorithms is the same. When you start looking at non-IID data, things get a lot more tricky. One example is if there is temporal dependence, in which case typically the approach is to assume that the dependence only spans a certain window, and samples can therefore be considered approximately IID outside of this window (see for example this brilliant but tough paper Chromatic PAC-Bayes Bounds for Non-IID Data: Applications to Ranking and Stationary β-Mixing Processes), after which normal analysis can be applied.
So, yes, it comes down in part to convenience, and in part because in the real world, most errors do look (roughly) Gaussian. One should of course always be careful when looking at a new problem to make sure that the assumptions aren't violated.
When you do the Bayesian analysis are you allowing x1, x2, and x3 to vary/be updated? or are they fixed throughout the analysis?
Most analyses that I have seen, Frequentist or Bayesian, will treat the observed data as fixed and compute the estimates of the coefficients conditional on the observed data. So in that case you can think about the x's as being fixed and not worry about products of normals.
You should also note that while regression models (both Frequentist and Bayesian) assume that the "True" errors are iid Normal, or than the respones is independent normal with mean based on the regression equation, neither assume that the observed residuals are iid Normal (else why would there be standardized, studentized, etc. residuals?). Focusing too much on the distributional properties of the observed residuals may lead away from enlightenment rather than too it.
Best Answer
The classic linear regression model is:
$$ y_i = \beta_0 + \beta_1 x_{i,1} + \ldots + \beta_k x_{i,k} + \epsilon_i$$
The error term captures everything else that's going on besides a linear relation ship with $x_1$ through $x_k$! An entirely equivalent way to write the linear model that may be instructive is:
$$ \epsilon_i = y_i - \left(\beta_0 + \beta_1 x_{i,1} + \ldots + \beta_k x_{i,k}\right) $$
From this, you can get a sense of where linear regression can go wrong. If $\epsilon_i$ has stuff going on such that If $\mathrm{E}\left[\epsilon_i \mid X \right] \neq 0$, then strict exogeneiety is violated and the regressors and the error term are no longer orthogonal. (Orthogonality of the regressors and the error term is what gives rise to the normal equations, to the OLS estimator $\hat{\mathbf{b}} = \left(X'X\right)^{-1} X'\mathbf{y}$.)
Think of the error term as a garbage collection term, a term that collects EVERYTHING ELSE that's going on besides a linear relationship between $y_i$ and your observed regressors $x_1, \ldots, x_k$. What could end up in the error term is limitless. Of course, what's allowed into the error term for OLS to be a consistent estimator isn't limitless :P.