Beta Distribution – Understanding Intuition for Beta Distribution with Alpha and/or Beta Less Than 1

bayesianbeta distributionintuitionjeffreys-prior

I am curious for myself, but also trying to explain this to others.

The beta distribution is often used as a Bayesian conjugate prior for a binomial likelihood. It is often explained with the example that $\left(\alpha-1\right)$ is analogous to the number of successes and $\left(\beta-1\right)$ is like the number of fails.


As expected, a beta distribution with $\alpha=\beta=1$ is equivalent to a uniform distribution.

But the beta distribution can have values less than 1 (any non-negative number). At the extreme case, $\alpha=\beta=0$ yields a bimodal PDF (probability density function) with values at only 0 and 1. I can still intuit this: it represents a case like flipping a coin – not the probability of heads or tails, but rather the outcomes: there are only 2 possibilities, 0 or 1 (or heads or tails).


But any $\alpha$ or $\beta$ value between 0 and 1 I cannot find a good way to explain or think about. I can calculate it, but not really grok it.


Bonus points for anyone who can help explain the difference between a conjugate prior using what to me seems it should provide no information, which would be a beta distribution with $\alpha=\beta=1$, and what is actually used as a prior with no information, the Jeffrey's Prior, which uses $\alpha=\beta=0.5$.

Addendum

Looks like I need to be clearer. I am looking to understand, conceptually what natural phenomenon might be represented by a beta distribution with $\alpha=\beta=\frac{1}{2}$.

For instance,

  • Binomial distribution with n=10 and k=4 "means": some phenomenon with a bimodal response experienced 4 "successes" in 10 attempts.
  • Poisson distribution with k=2 and $\lambda=4.5$ means: some phenomenon that "typically" happens 4.5 times per hour (or whatever unit of time) only happened twice in the interval.

Or even with positive integer beta distributions, I can say:

  • Beta distribution with $\alpha=4$ and $\beta=7$ means: some phenomenon with a bimodal response had 3 successes and 6 fails in 9 attempts.
    • (I know this one is a bit inaccurate, since beta distributions are continuous and provide a probability density instead of mass, but this is often how it is conceptually viewed or explained, and why it is used as a conjugate prior.)

What sort of similar construct or meaning could I create for the beta distribution with $\alpha=\beta=\frac{1}{2}$?


I am not looking for a plot. As I said earlier, I know how to work with a beta distribution mathematically (plot it, calculate it, etc.) I am just trying to get some natural intuition.

Best Answer

Here is a frivolous example that may have some intuitive value.

In US Major League Baseball each team plays 162 games per season. Suppose a team is equally likely to win or lose each of its games. What proportion of the time will such a team have more wins than losses? (In order to have symmetry, if a team's wins and losses are tied at any point, we say it is ahead if it was ahead just before the tie occurred, otherwise behind.)

Suppose we look at a team's win-loss record as the season progresses. For our team with wins and losses are as if determined by tosses of a fair coin, you might think a team would most likely be ahead about half the time throughout a season. Actually, half the time is the least likely proportion of time for being ahead.

The "bathtub shaped" histogram below shows the approximate distribution of the proportion of time during a season that such a team is ahead. The curve is the PDF of $\mathsf{Beta}(.5,.5).$ The histogram is based on 20,000 simulated 162-game seasons for a team where wins and losses are like independent tosses of a fair coin, simulated in R as follows:

set.seed(1212);  m = 20000;  n = 162;  prop.ahead = numeric(m)
for (i in 1:m)
 {
 x = sample(c(-1,1), n, repl=T);  cum = cumsum(x)
 ahead = (c(0, cum) + c(cum,0))[1:n]  # Adjustment for ties
 prop.ahead[i] = mean(ahead >= 0)
 }

cut=seq(0, 1, by=.1); hdr="Proportion of 162-Game Season when Team Leads"  
hist(prop.ahead, breaks=cut, prob=T, col="skyblue2", xlab="Proportion", main=hdr)
curve(dbeta(x, .5, .5), add=T, col="blue", lwd=2)

enter image description here

Note: Feller (Vol. 1) discusses such a process. The CDF of $\mathsf{Beta}(.5,.5)$ is a constant multiple of an arcsine function, so Feller calls it an 'Arcsine Law'.

Related Question