Your "assume also" clause equates two quadratic forms in $\mathbb{R}^n$ (with $\mathrm{y}=(y_1,y_2,\ldots,y_n)$ the variable). Since any quadratic form is completely determined by its values at $1+n+\binom{n+1}{2}$ distinct points, their agreement at all points of $\mathbb{R}^n$ is far more than needed to conclude the two forms are identical, whence their coefficients must be the same.
The coefficients of $y_1^2$ are $1/\sigma^2$ and $1/\nu^2$, whence $\sigma=\pm \nu$. We always stipulate that $\sigma$ and $\nu$ are nonnegative, implying $\sigma=\nu$. (The "real" parameter should be considered to be $\sigma^2$ or $1/\sigma^2$ rather than $\sigma$ itself.)
The linear terms in $y_i$ are both proportional to $b_0+b_1 x_i = a_0 + a_1 x_i$. Letting $\mathrm{1} = (1,1,\ldots, 1)$ and $\mathrm{x} = (x_1, x_2, \ldots, x_n)$, we conclude
$$(a_0 - b_0)\mathrm{1} + (a_1 - b_1)\mathrm{x} = \mathrm{0}.$$
Thus either
$\mathrm{1}$ and $\mathrm{x}$ are linearly independent, which by definition implies both $a_0 = b_0$ and $a_1 = b_1$, or
$\mathrm{1}$ and $\mathrm{x}$ are linearly dependent, which means $x_1 = x_2 = \cdots = x_n = x$, say. In that case
- If $x \ne 0$, $a_0 - b_0 = (a_1 - b_1) x$ determines one of $(a_0, a_1, b_0, b_1)$ in terms of the other three, or
- Otherwise $a_0=b_0$ and $a_1$ and $b_1$ could have any values.
In case (1) all parameters are uniquely determined: this is the identifiable model. In case (2) $\sigma = \nu$ is identifiable no matter what and various linear combinations of $(a_0,a_1,b_0,b_1)$ can be identified.
Evidently, linear independence of $\mathrm{x}$ and $\mathrm{1}$ is both necessary and sufficient for identifiability.
This criterion easily generalizes to multiple regression, where the ordinary least squares model is identifiable if and only if the design matrix $X$ (whose columns are formed from $\mathrm{1}, \mathrm{x}$, and any other variables in any order) has full rank: that is, there is no linear dependence among its columns.
There exists an improper (sometimes called non-informative) prior that with Bayesian analysis will result in the exact same estimates as classical regression. Using a proper/informative prior will give different estimates, with the estimated slope being an "average" of the OLS estimate and the prior information.
The interpretation also differs, the Bayesian analysis results in a posterior density estimation instead of classical confidence intervals and hypothesis tests. The Bayesian approach can also use a likelihood different from the normal distribution which can give more flexibility.
Best Answer
You can estimate an optimal lambda that minimizes testing error during cross-validation. Testing error (i.e. Mean Squared Prediction error on a hold-out testing set) should decrease as lambda increases from zero as the training data is less and less overfit, but beyond a certain point it will increase back up again as the model is inadequately capturing the data. Optimal lambda can be conservatively chosen as the one which produces a testing error that is one standard error away from the minimum testing error (on the side of the higher lambda value).