Bayesian Regression – Understanding Partially Specified Bayesian Priors

In bayesian linear regression for example, we may specify a model as:
$$y_i \sim N(\beta_0 + \beta_1 x_i, \epsilon^2) \\\\
\beta_0 \sim N(0, \tau_0^2) \\\\
\beta_1 \sim N(0, \tau_1^2) \\\\
\epsilon \sim N(0, \sigma^2)
$$

The posterior can be constructed as
$$
P(\beta_0, \beta_1, \epsilon|y) = \frac{P(y|\beta_0, \beta_1, \epsilon)\cdot P(\beta_0)\cdot P(\beta_1)\cdot P(\epsilon)}{P(y)}
$$

My question is: is it possible to partially specify prior, e.g. specify prior distribution for $\beta_1$ only, but not $\beta_0$?

$$y_i \sim N(\beta_0 + \beta_1 x_i, \epsilon^2) \\\\
\beta_1 \sim N(0, \tau_1^2) \\\\
\epsilon \sim N(0, \sigma^2)
$$

I guess the posterior will be something like:
$$
P(\beta_1, \epsilon|y) = \frac{P(y|\beta_1, \epsilon)\cdot P(\beta_1)\cdot P(\epsilon)}{P(y)}
$$

How do I interpret this posterior? Does this setting makes $\beta_0$ behave like a frequentist term (fixed but unknown)? How does this model relates to Ridge regression, where we penalize slope but not intercept?

Best Answer

Another possibility than the one suggested by Peter Leopold is to use an improper, flat prior $p(x) \propto 1$. This is what is used in Stan probabilistic programming language when no prior is specified. This is possible but not recommended and while intuitively this is an "uninformative" prior, no prior is uninformative, and it may lead to many subtle problems.

Best Answer

Related Solutions

Regression – Expected Value and Variance of Slope Parameter Estimation in Simple Linear Regression

Solved – Deriving the posterior density for a lognormal likelihood and Jeffreys’s prior

Related Question