Solved – When can’t frequentist sampling distribution be interpreted as Bayesian posterior in regression settings

bayesianfrequentistmaximum likelihoodposterior

My actual questions are in the last two paragraphs, but to motivate them:

If I am attempting to estimate the mean of a random variable that follows a Normal distribution with a known variance, I've read that putting a uniform prior on the mean results in a posterior distribution that is proportional to the likelihood function. In these situations, the Bayesian credible interval overlaps perfectly with the frequentist confidence interval, and the Bayesian maximum a posteriori estimate is equal to the frequentist maximum likelihood estimate.

In a simple linear regression setting,

$Y = \textbf{X}\beta+\epsilon, \hspace{1cm} \epsilon\sim N(0,\sigma^2) $

putting a uniform prior on $\beta$ and an inverse-gamma prior on $\sigma^2$ with small parameter values results in a posterior $\hat\beta^{MAP}$ that will be very similar to the frequentist $\hat\beta^{MLE}$, and a credible interval for the posterior distribution of $\beta|X$ that will be very similar to the confidence interval around the maximum likelihood estimate. They will not be exactly the same because the prior on $\sigma^2$ exerts a small amount of influence, and if the posterior estimation is carried out via MCMC simulation that will introduce another source of discrepancy, but the Bayesian credible interval around the $\hat\beta^{MAP}$ and frequentist confidence interval around $\hat\beta^{MLE}$ will be pretty close to each other, and of course as the sample size increases they should converge as the influence of the likelihood grows to dominate that of the prior.

But I've read that there are also regression situations where these near-equivalencies do not hold. For example, hierarchical regressions with random effects, or logistic regression — these are situations where, as I understand it, there are no "good" objective or reference priors.

So my general question is this — assuming that I want to make inference about $P(\beta|X)$ and that I don't have prior information that I want to incorporate, why can't I proceed with frequentist maximum likelihood estimation in these situations and interpret the resulting coefficient estimates and standard errors as Bayesian MAP estimates and standard deviations, and implicitly treat these "posterior" estimates as resulting from a prior that must have been "uninformative" without attempting to find the explicit formulation of the prior that would lead to such a posterior? In general, within the realm of regression analysis, when is it okay to proceed along these lines (of treating the likelihood like a posterior) and when is it not okay? What about with frequentist methods that are not likelihood-based, such as quasi-likelihood methods, or ordinary or weighted least squares methods generally (where the coefficient estimates still have well-defined sampling distributions under the frequenist paradigm)?

Do the answers depend on whether my target of inference is coefficient point estimates, or the probability of a coefficient being within a particular range, or quantities of the predictive distribution?

Best Answer

This is basically a question about $p$-values and maximum likelihood. Let me quote Cohen (1994) in here

What we want to know is "Given this data what is the probability that $H_0$ is true?" But as most of us know, what it [$p$-value] tells us is "Given that $H_0$ is true, what is the probability of this (or more extreme) data?" These are not the same (...)

So $p$-value tells us what is the $P(D|H_0)$, while we are interested in $P(H_0|D)$ (see also the discussion on Fisherian vs Neyman-Pearson framework).

Let's forget for a moment about $p$-values. The probability of observing our data given some parameter $\theta$ is the likelihood function

$$ L(\theta | D) = P(D|\theta) $$

that is one way of looking at statistical inference. Another way is Bayesian approach where we want to learn directly (rather than indirectly) about $P(\theta|D)$ by employing the Bayes theorem and using priors for $\theta$

$$ \underbrace{P(\theta|D)}_\text{posterior} \propto \underbrace{P(D|\theta)}_\text{likelihood} \times \underbrace{P(\theta)}_\text{prior} $$

Now, if you look at the overall picture, you'll see that $p$-values and likelihood answer a different questions than Bayesian estimation.

So, while maximum likelihood estimates should be the same as MAP Bayesian estimates under uniform priors, you have to remember that they answer a different question.

Cohen, J. (1994). The earth is round (p<.05). American Psychologist, 49, 997-1003.

Best Answer

Related Solutions

Bayesian Methods – When Are Bayesian Methods Preferable to Frequentist Approaches?

Bayesian Posterior Estimate vs Frequentist Unbiased Estimator – Mean Comparison

Related Question