Bayesian Priors – Do They Become Irrelevant with Large Sample Size?

bayesianprior

When performing Bayesian inference, we operate by maximizing our likelihood function in combination with the priors we have about the parameters. Because the log-likelihood is more convenient, we effectively maximize $\sum \ln (\text{prior}) + \sum \ln (\text{likelihood})$ using an MCMC or otherwise which generates the posterior distributions (using a pdf for each parameter's prior and each data point's likelihood).

If we have a lot of data, the likelihood from that is going to overwhelm any information that the prior provides, by simple mathematics. Ultimately, this is good and by design; we know that the posterior will converge to just the likelihood with more data because it is supposed to.

For problems defined by conjugate priors, this is even provable exactly.

Is there a way to decide when priors don't matter for a given likelihood function and some sample size?

Best Answer

It is not that easy. Information in your data overwhelms prior information not only your sample size is large, but when your data provides enough information to overwhelm the prior information. Uninformative priors get easily persuaded by data, while strongly informative ones may be more resistant. In extreme case, with ill-defined priors, your data may not be able at all to overcome it (e.g. zero density over some region).

Recall that by Bayes theorem we use two sources of information in our statistical model, out-of-data, prior information, and information conveyed by data in likelihood function:

$$ \color{violet}{\text{posterior}} \propto \color{red}{\text{prior}} \times \color{lightblue}{\text{likelihood}} $$

When using uninformative prior (or maximum likelihood), we try to bring minimal possible prior information into our model. With informative priors we bring substantial amount of information into the model. So both, the data and prior, inform us what values of estimated parameters are more plausible, or believable. They can bring different information and each of them can overpower the other one in some cases.

Let me illustrate this with very basic beta-binomial model (see here for detailed example). With "uninformative" prior, pretty small sample may be enough to overpower it. On the plots below you can see priors (red curve), likelihood (blue curve), and posteriors (violet curve) of the same model with different sample sizes.

enter image description here

On another hand, you can have informative prior that is close to the true value, that would also be easily, but not that easily as with weekly informative one, persuaded by data.

enter image description here

The case is very different with informative prior, when it is far from what the data says (using the same data as in first example). In such case you need larger sample to overcome the prior.

enter image description here

So it is not only about sample size, but also about what is your data and what is your prior. Notice that this is a desired behavior, because when using informative priors we want to potentially include out-of-data information in our model and this would be impossible if large samples would always discard the priors.

Because of complicated posterior-likelihood-prior relations, it is always good to look at the posterior distribution and do some posterior predictive checks (Gelman, Meng and Stern, 1996; Gelman and Hill, 2006; Gelman et al, 2004). Moreover, as described by Spiegelhalter (2004), you can use different priors, for example "pessimistic" that express doubts about large effects, or "enthusiastic" that are optimistic about estimated effects. Comparing how different priors behave with your data may help to informally assess the extent how posterior was influenced by prior.


Spiegelhalter, D. J. (2004). Incorporating Bayesian ideas into health-care evaluation. Statistical Science, 156-174.

Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2004). Bayesian data analysis. Chapman & Hall/CRC.

Gelman, A. and Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models. Cambridge University Press.

Gelman, A., Meng, X. L., and Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica sinica, 733-760.

Related Question