Bayesian Estimation basics. Density and estimation methods

bayes-theoremprobabilitystatistics

So reading about Bayesian Estimation several questions arise. Based on Bayes theorem, if we eliminate the constant component, the posterior probability will depend solely on the apriori probability and the maximum likelihood function:

$$g(θ) = p(θ)L(θ|X1, .. Xn)$$

  • The density a priori p(θ), comes from a population that presents parameters other than the data estimated using the maximum likelihood function. Does the distribution also change? That is, if we have an estimator for the sample data that correctly fits a normal, shouldn't the a priori distribution also be normal or very similar?

  • Why do we obtain the sample estimator using the maximum likelihood method? I understand that in Bayes' theorem: the posterior density is proportional to the posterior density multiply by the likelihood, but would it not be possible to use other estimators? What occurred to me is that, taking into account the Bayesian estimator an unset parameter, the lack of bias is not so relevant, and robustness and efficiency are prioritized; but I'm not sure this is so.

  • Lastly, when is it convenient to use a Bayesian estimator? I understand that it depends solely on the experimental approach, in such a way that if you want to obtain a long-term estimator it can be efficient to obtain the Bayesian estimate (for example, applied to big data) while in specific studies it may be better to use the frequentist estimator.

I hope that little by little it becomes clearer, any comment is appreciated.

Cheers!

Best Answer

Let's start with a very simple example of Bayesian inference that includes some of the issues you raise. Then you may have a framework for follow-up questions and raising additional issues.

A political consultant is hired to advise one candidate in an upcoming election. From prior experience with other elections and some knowledge of the candidate, the consultant has the prior distribution $\mathsf{Beta}(330, 270)$ for the probability $\theta$ that the candidate will win. That is, the consultant thinks the probability the candidate will win is roughly 0.55 and likely between 0.51 and 0.59. Computation in R:

330/(330+270)
[1] 0.55       # mean of BETA(330, 270)
qbeta(c(.025, .975), 330, 270)
[1] 0.5100824 0.5896018

The prior distribution has density proportional to $p(\theta) \propto \theta^{330-1}(1-\theta)^{270-1}.$

enter image description here

Choosing the prior distribution is often at least partially a matter of opinion. The consultant might have been just as happy with another similar beta distribution as her prior.

Results of a public opinion poll by a reputable pollster show that $x = 620$ out of $n = 1000$ randomly chosen likely voters favor the candidate. Thus the binomial likelihood is proportional to $L(x|\theta) = \theta^{620}(1-\theta)^{1000-620}.$

Then by Bayes' Theorem, the posterior distribution is proportional to $$\theta^{330-1}(1-\theta)^{270-1}\times\theta^x(1-\theta)^{1000-620} \propto g(\theta|x) \\ \propto \theta^{330 + 620 - 1}(1-\theta)^{270 + 1000 -620-1}\\ \propto \theta^{950-1}(1-\theta)^{650-1},$$ where we recognize the last term as proportional to the density function of $\mathsf{Beta}(950, 650).$ Information in this posterior distribution is a melding of information in the prior distribution and in the data.

In this case, it is easy to find the posterior distribution because the binomial likelihood is 'conjugate to' (mathematically compatible with) the beta density of the prior distribution.

A 95% Bayesian probability interval $(.670, 618)$ for $\theta$ can be found by cutting 2.5% of the probability from each tail of the posterior distribution. Possible point estimates are the mean, median, or mode (in this case, all about 0.594).

qbeta(c(.025, .975), 950, 650)
[1] 0.5695848 0.6176932

Here is a plot of the prior and posterior distributions. The 95% posterior probability interval is shown by dashed lines.

enter image description here

So data from the poll together with the prior distribution show a slightly more favorable standing of the candidate than did the prior distribution.

Notes: (1) If the prior distribution in this example had been the 'noninformative' Jeffrey's prior $\mathsf{Beta}(.5,.5),$ then the 95% Bayesian posterior interval would have been nearly the same (numerically) as a frequentist 95% confidence interval (but Bayesians and frequentists interpret interval estimates somewhat differently).

(2) A conjugate prior distribution for a Poisson likelihood function is a gamma distribution. Normal likelihood functions are conjugate to normal likelihood functions.

(3) Reference. Suess & Trumbo (2010), Springer. The example shown above is similar to one found in Chapter 8 of this book.

Related Question