Bayesian Theory – History of Uninformative Prior Theory

bayesianhistorypriorreferencesuninformative-prior

I am writing a short theoretical essay for a Bayesian Statistics course (in an Economics M.Sc.) on uninformative priors and I am trying to understand which are the steps in the development of this theory.

By now, my timeline is made three main steps: Laplace's indifference principle (1812), Non-Invariant priors (Jeffreys (1946)), Bernardo reference prior (1979).

From my literature review, I've understood that indifference principle (Laplace) was the first tool used to represent lack of prior information but the missing requirement of invariance has led to its abandonment until the 40s, when Jeffreys introduced his method, which has the desired property of invariance. The arise of paradoxes of marginalization due to the careless use of improper prior in the 70s pushed Bernardo to elaborate his reference prior theory to deal with this issue.

Reading the literature, every author cites different contributes: Jaynes' maximum entropy, Box and Tiao's data-translated likelihood, Zellner, …

In your opinion, what are the crucial steps I am missing?

EDIT: I add my (main) references, if someone needs:

1) The selection of prior by formal rules, Kass, Wasserman

2) A catalogue of non informative priors, Yang, Berger

3) Noninformative Bayesians Priors Interpretation and Problems with Construction and Applications

EDIT 2: Sorry for the 2 year delay but here you can find my essay here

Best Answer

What you seem to be missing is the early history. You can check the paper by Fienberg (2006) When Did Bayesian Inference Become "Bayesian"?. First, he notices that Thomas Bayes was the first one who suggested using a uniform prior:

In current statistical language, Bayes' paper introduces a uniform prior distribution on the binomial parameter, $\theta$, reasoning by analogy with a "billiard table" and drawing on the form of the marginal distribution of the binomial random variable, and not on the principle of "insufficient reason," as many others have claimed.

Pierre Simon Laplace was the next person to discuss it:

Laplace also articulated, more clearly than Bayes, his argument for the choice of a uniform prior distribution, arguing that the posterior distribution of the parameter $\theta$ should be proportional to what we now call the likelihood of the data, i.e.,

$$ f(\theta\mid x_1,x_2,\dots,x_n) \propto f(x_1,x_2,\dots,x_n\mid\theta) $$

We now understand that this implies that the prior distribution for $\theta$ is uniform, although in general, of course, the prior may not exist.

Moreover Carl Friedrich Gauss also referred to using an uninformative prior, as noted by David and Edwards (2001) in their book Annotated Readings in the History of Statistics:

Gauss uses an ad hoc Bayesian-type argument to show that the posterior density of $h$ is proportional to the likelihood (in modern terminology):

$$ f(h|x) \propto f(x|h) $$

where he has assumed $h$ to be uniformly distributed over $[0, \infty)$. Gauss mentions neither Bayes nor Laplace, although the latter had popularized this approach since Laplace (1774).

and as Fienberg (2006) notices, "inverse probability" (and what follows, using uniform priors) was popular at the turn of the 19th century

[...] Thus, in retrospect, it shouldn't be surprising to see inverse probability as the method of choice of the great English statisticians of the turn of the century, such as Edgeworth and Pearson. For example, Edgeworth (49) gave one of the earliest derivations of what we now know as Student's $t$-distribution, the posterior distribution of the mean $\mu$ of a normal distribution given uniform prior distributions on $\mu$ and $h =\sigma^{-1}$ [...]

The early history of the Bayesian approach is also reviewed by Stigler (1986) in his book The history of statistics: The measurement of uncertainty before 1900.

In your short review you also do not seem to mention Ronald Aylmer Fisher (again quoted after Fienberg, 2006):

Fisher moved away from the inverse methods and towards his own approach to inference he called the "likelihood," a concept he claimed was distinct from probability. But Fisher's progression in this regard was slow. Stigler (164) has pointed out that, in an unpublished manuscript dating from 1916, Fisher didn't distinguish between likelihood and inverse probability with a flat prior, even though when he later made the distinction he claimed to have understood it at this time.

Jaynes (1986) provided his own short review paper Bayesian Methods: General Background. An Introductory Tutorial that you could check, but it does not focus on uninformative priors. Moreover, as noted by AdamO, you should definitely read The Epic Story of Maximum Likelihood by Stigler (2007).

It is also worth mentioning that there is no such thing as an "uninformative prior", so many authors prefer talking about "vague priors", or "weekly informative priors".

A theoretical review is provided by Kass and Wasserman (1996) in The selection of prior distributions by formal rules, who go into greater detail about choosing priors, with extended discussion of usage of uninformative priors.

Related Question