Probability – Origin and Properties of the Beta Distribution

beta distributiondensity functionhistorymathematical-statisticsprobability

As I'm sure everyone here knows already, the PDF of the Beta distribution $X \sim B(a,b)$ is given by

$f(x) = \frac{1}{B(a,b)}x^{a-1}(1-x)^{b-1}$

I've been hunting all over the place for an explanation of the origins of this formula, but I can't find it. Every article I've found on the Beta distribution seems to give this formula, illustrate a few of its shapes, then go straight on to discussing its moments and on from there.

I don't like using mathematical formulae I can't derive and explain. For other distributions (e.g. the gamma or the binomial) there's a clear derivation I can learn and use. But I can't find anything like that for the Beta distribution.

So my question is: what are the origins of this formula? How can it be derived from first principles in whatever context it was originally developed?

[To clarify, I'm not asking about how to use the Beta distribution in Bayesian statistics, or what it means intuitively in practice (I've read the baseball example). I just want to know how to derive the PDF. There was a previous question that asked something similar, but it was marked (I think incorrectly) as a duplicate of another question that did not address the issue, so I haven't been able to find any help on here so far.]

EDIT 2017-05-06: Thanks everyone for the questions. I think a good explanation of what I want comes from one of the answers I got when I asked this of some of my course instructors:

"I guess people could derive the normal density as a limit of a sum of n things divided by sqrt(n), and you can derive the poisson density from the idea of events occurring at a constant rate. Similarly, in order to derive the beta density, you would have to have some kind of idea of what makes something a beta distribution independantly from, and logically prior to, the density."

So the "ab initio" idea in the comments is probably closest to what I'm looking for. I am not a mathematician, but I feel most comfortable using mathematics that I can derive. If the origins are too advanced for me to handle, so be it, but if not I would like to understand them.

Best Answer

As a former physicist I can see how it could have been derived. This is how physicists proceed:

when they encounter a finite integral of a positive function, such as beta function: $$B(x,y) = \int_0^1t^{x-1}(1-t)^{y-1}\,dt$$ they instinctively define a density: $$f(s|x,y)=\frac{s^{x-1}(1-s)^{y-1}}{\int_0^1t^{x-1}(1-t)^{y-1}\,dt}=\frac{s^{x-1}(1-s)^{y-1}}{B(x,y)},$$ where $0<s<1$

They do this to all kinds of integrals all the time so often that it happens reflexively without even thinking. They call this procedure "normalization" or similar names. Notice how by definition trivially the density has all the properties that you want it to have, such as always positive and adds up to one.

The density $f(t)$ that I gave above is of Beta distribution.

UPDATE

@whuber's asking what's so special about Beta distribution while the above logic could be applied to an infinite number of suitable integrals (as I noted in my answer above)?

The special part comes from the binomial distribution. I'll write its PDF using similar notation to my beta, not the usual notation for parameters and variables: $$ f'(x,y|s) = \binom {y+x} x s^x(1-s)^{y}$$

Here, $x,y$ - number of successes and failures, and $s$ - probability of success. You can see how this is very similar to the numerator in the Beta distribution. In fact, if you look for the prior for Binomial distribution, it'll be the Beta distribution. It's not surprising also because the domain of Beta is 0 to 1, and that's what you do in Bayes theorem: integrate over the parameter $s$, which is the probability of success in this case as shown below: $$\hat f(x|X)=\frac{f'(X|s)f(s)}{\int_0^1 f'(X|s)f(s)ds},$$ here $f(s)$ - probability (density) of probability of success given the prior settings of Beta distribution, and $f'(X|s)$ - density of this data set (i.e. observed success and failures) given a probability $s$.