Solved – Choosing between uninformative beta priors

bayesianbeta distributionprioruninformative-prior

I am looking for uninformative priors for beta distribution to work with a binomial process (Hit/Miss). At first I thought about using $\alpha=1, \beta=1$ that generate an uniform PDF, or Jeffrey prior $\alpha=0.5, \beta=0.5$. But I am actually looking for priors that have the minimum effect on posterior results, and then I thought about using an improper prior of $\alpha=0, \beta=0$. The problem here is that my posterior distribution only works if I have at least one hit and one miss. To overcome this I then thought about using a very small constant, like $\alpha=0.0001, \beta=0.0001$, just to assure that posterior $\alpha$ and $\beta$ will be $>0$.

Does anyone knows if this approach is acceptable? I see numerical effects of changing these prior, but someone could give me a sort of interpretation of putting small constants like this as priors?

Best Answer

First of all, there is no such a thing as uninformative prior. Below you can see posterior distributions resulting from five different "uninformative" priors (described below the plot) given different data. As you can clearly see, the choice of "uninformative" priors affected the posterior distribution, especially in cases where the data itself did not provide much information.

Posteriors from uninformative priors

"Uninformative" priors for beta distribution share the property that $\alpha = \beta$, what leads to symmetric distribution, and $\alpha \le 1, \beta \le 1$, the common choices: are uniform (Bayes-Laplace) prior ($\alpha = \beta = 1$), Jeffreys prior ($\alpha = \beta = 1/2$), "Neutral" prior ($\alpha = \beta = 1/3$) proposed by Kerman (2011), Haldane prior ($\alpha = \beta = 0$), or it's approximation ($\alpha = \beta = \varepsilon$ with $\varepsilon > 0$) (see also the great Wikipedia article).

Parameters of beta prior distribution are commonly considered as "pseudocounts" of successes ($\alpha$) and failures ($\beta$) since the posterior distribution of beta-binomial model after observing $y$ successes in $n$ trials is

$$ \theta \mid y \sim \mathcal{B}(\alpha + y, \beta + n - y) $$

so the higher $\alpha,\beta$ are, the more influential they are on the posterior. So when choosing $\alpha=\beta=1$ you assume that you "saw" in advance one success and one failure (this may or may not be much depending on $n$).

At first sight, Haldane prior, seems to be the most "uninformative", since it leads to the posterior mean, that is exactly equal to maximum likelihood estimate

$$ \frac{\alpha + y}{\alpha + y + \beta + n - y} = y / n $$

However, it leads to improper posterior distributions when $y=0$ or $y=n$, what has made Kernal et al to suggest their own prior that yields posterior median that is as close as possible to the maximum likelihood estimate, at the same time being a proper distribution.

There is a number of arguments for and against each of the "uninformative" priors (see Kerman, 2011; Tuyl et al, 2008). For example, as discussed by Tuyl et al,

. . . care needs to be taken with parameter values below $1$, both for noninformative and informative priors, as such priors concentrate their mass close to $0$ and/or $1$ and can suppress the importance of the observed data.

On another hand, using uniform priors for small datasets may be very influential (think of it in terms of pseudocounts). You can find much more information and discussion on this topic in multiple papers and handbooks.

So sorry, but there is no single "best", "most uninformative", or "one-size-fitts-all" priors. Each of them brings some information into the model.

Kerman, J. (2011). Neutral noninformative and informative conjugate beta and gamma prior distributions. Electronic Journal of Statistics, 5, 1450-1470.

Tuyl, F., Gerlach, R. and Mengersen, K. (2008). A Comparison of Bayes-Laplace, Jeffreys, and Other Priors. The American Statistician, 62(1): 40-44.

Related Question