Recall that simple linear regression can be described as
$$Y = m(X) + \frac{Cov(X,X)}{Var(X)}(X-m(X))$$
so to calculate it, you need means $m(X), m(Y)$, variance of $X$ and covariance between them. As you can learn from here mean and variance can be calculated using online algorithm in single pass. There is also online algorithm to calculate covariance as described in Wikipedia. So the only thing you need to do, is to substitute the regular estimators with their online counterparts. See also the Are there algorithms for computing "running" linear or logistic regression parameters? thread.
Taking this apart, what people would usually do in such cases is use the stochastic gradient descent as it is easy to implement (already implemented in many packages, e.g. Vowpal Wabbit, TensorFlow, Scikit-learn) and to adapt for more complicated cases. Also notice that while it would be possible to estimate it in single pass, then using more then one pass would give you more accurate results.
As about calculating mean squared error, this is very simple, you need only to store the sum of squared errors and at each iteration increase it by squared error for the current observation, and then just divide it by the number of the iterations made. Notice that MSE would change the same as your model changes, so how to calculate it is also a question what exactly does it have to measure. If you want to catch how good is your model currently but accounting for the past performance (previous iterations), then the above solution seems to work for this. If you want to measure only the current performance, then you'd need to re-calculate it at each iteration on the whole dataset. On another hand, if you want to measure the out-of-data performance, then probably the best you can do is to calculate it at each step for the hold-out sample, or for the $t+1,\dots,t+k$ observations ahead as described by Hyndman and Athanasopoulos.
By itself, no. The choice of loss function depends on your data and the nature of the problem you are trying to solve. As you noticed, mean square error is sensitive to large errors, while something like mean absolute error is less sensitive. Such sensitiveness is sometimes a desirable property, while in other cases you need a robust loss function that is insensitive. You use squared error when you need it to be sensitive.
No matter what loss you choose, overfitting is about the whole model, not only the loss. For example, if you use a simple regression model that minimizes the squared error, it has no chance to overfit, because it is not expressive enough. On another hand, if you use instead something like $k$NN no matter of the loss, it could overfit. Basically, if the model can drag the training error to zero (e.g. model with enough parameters to memoize the data), it eventually will, no matter what the loss is.
Best Answer
Yes and no. It will stay the same. MSE is MSE, the method of estimation you used does not matter. The only difference is that in classical approach you get a point estimate and in Bayesian you get a distribution of likely values and if you want to compare both approaches using MSE, you need to decide on some kind of point estimate as well (e.g. mean, median, or mode of posterior distribution).