Solved – Sampling distribution of regression coefficients

bayesianmathematical-statisticsregressionsamplingself-study

I previously learned about sampling distributions that gave results which were for the estimator, in terms of the unknown parameter. For example, for the sampling distributions of $\hat\beta_0$ and $\hat\beta_1$ in the linear regression model $Y_i = \beta_o + \beta_1 X_i + \varepsilon_i$

$$
\hat{\beta}_0 \sim \mathcal N \left(\beta_0,~\sigma^2\left(\frac{1}{n}+\frac{\bar{x}^2}{S_{xx}}\right)\right)
$$
and
$$
\hat{\beta}_1 \sim \mathcal N \left(\beta_1,~\frac{\sigma^2}{S_{xx}}\right)
$$

where $S_{xx} = \sum_{i=1}^n (x_i^2) -n \bar{x}^2$

But now I have seen the following in a book:

Suppose we fit the model by least squares in the usual way. Consider
the Bayesian posterior distribution, and choose priors so that this is
equivalent to the usual frequentist sampling distribution, that
is……

$$ \left( \begin{matrix} \beta_0 \\ \beta_1 \end{matrix} \right)
\sim \mathcal N_2\left[\left(\begin{matrix} \hat{\beta}_1 \\
\hat{\beta}_2 \end{matrix} \right),~\hat{\sigma}^2
\left(\begin{matrix} n & \sum_{i=1}^{n}x_i \\ \sum_{i=1}^{n}x_i &
\sum_{i=1}^{n}x_i^2 \end{matrix} \right) ^{-1}\right] $$

This is confusing me because:

  1. Why do the estimates appear on the left hand side (lhs) of the first 2 expressions, and the right hand side (rhs) of the last expression?
  2. Why do the beta hats in the last expression have 1 and 2 subscripts instead of 0 and 1?
  3. Are these just different representations of the same thing? If they are, could someone show my how they are equivalent? If not, could someone explain the difference?
  4. Is it the case that the last expression is the "inversion" of the first two? Is that why the 2×2 matrix in the last expression is inverted and estimates/parameters are switched from rhs$\leftrightarrow$lhs? If so could someone show me how to get from one to the others?

Best Answer

This part primarily relates to your first, third and fourth question:

There's a fundamental difference between Bayesian statistics and frequentist statistics.

Frequentist statistics makes inference about which fixed parameter values are consistent with data viewed as random, usually via the likelihood. You take $\theta$ (some parameter or parameters) as fixed but unknown, and see which ones make the data more likely; it looks at the properties of sampling from some model given the parameters to make inference about where the parameters might be. (A Bayesian might say the frequentist approach is based on 'the frequencies of things that didn't happen')

Bayesian statistics looks at the information on parameters in terms of a probability distribution on them, which is updated by data, via the likelihood. Parameters have distributions, so you look at $P(\theta|\underline{x})$.

This results in things which often look similar but where the variables in one look "the wrong way around" viewed through the lens of the other way of thinking about it.

So, fundamentally they're somewhat different things, and the fact that things that are on the LHS of one are on the RHS of the other is no accident.

If you do some work with both, it soon becomes reasonably clear.

The second question seems to me to relate simply to a typo.

---

the statement "equivalent to the usual frequentist sampling distribution, that is" : I took this to mean that the authors were stating the frequentist sampling distribution. Have I read this wrongly?

There's two things going on there - they've expressed something a bit loosely (people do this particular kind of over-loose expression all the time), and I think you're also interpreting it differently from the intent.

What exactly does the expression they give mean, then ?

Hopefully the discussion below will help clarify the intended sense.

If you can provide a reference (pref. online as I don't have good library access) where this expression is derived I would be grateful.

It follows right from here:

http://en.wikipedia.org/wiki/Bayesian_linear_regression

by taking flat priors on $\beta$ and I think a flat prior for $\sigma^2$ as well.

The reason is that the posterior is thereby proportional to the likelihood and the intervals generated from the posteriors on the parameters match the frequentist confidence intervals for the parameters.

You might find the first few pages here helpful as well.