I am curious for myself, but also trying to explain this to others.
The beta distribution is often used as a Bayesian conjugate prior for a binomial likelihood. It is often explained with the example that $\left(\alpha-1\right)$ is analogous to the number of successes and $\left(\beta-1\right)$ is like the number of fails.
As expected, a beta distribution with $\alpha=\beta=1$ is equivalent to a uniform distribution.
But the beta distribution can have values less than 1 (any non-negative number). At the extreme case, $\alpha=\beta=0$ yields a bimodal PDF (probability density function) with values at only 0 and 1. I can still intuit this: it represents a case like flipping a coin – not the probability of heads or tails, but rather the outcomes: there are only 2 possibilities, 0 or 1 (or heads or tails).
But any $\alpha$ or $\beta$ value between 0 and 1 I cannot find a good way to explain or think about. I can calculate it, but not really grok it.
Bonus points for anyone who can help explain the difference between a conjugate prior using what to me seems it should provide no information, which would be a beta distribution with $\alpha=\beta=1$, and what is actually used as a prior with no information, the Jeffrey's Prior, which uses $\alpha=\beta=0.5$.
Addendum
Looks like I need to be clearer. I am looking to understand, conceptually what natural phenomenon might be represented by a beta distribution with $\alpha=\beta=\frac{1}{2}$.
For instance,
- Binomial distribution with n=10 and k=4 "means": some phenomenon with a bimodal response experienced 4 "successes" in 10 attempts.
- Poisson distribution with k=2 and $\lambda=4.5$ means: some phenomenon that "typically" happens 4.5 times per hour (or whatever unit of time) only happened twice in the interval.
Or even with positive integer beta distributions, I can say:
- Beta distribution with $\alpha=4$ and $\beta=7$ means: some phenomenon with a bimodal response had 3 successes and 6 fails in 9 attempts.
- (I know this one is a bit inaccurate, since beta distributions are continuous and provide a probability density instead of mass, but this is often how it is conceptually viewed or explained, and why it is used as a conjugate prior.)
What sort of similar construct or meaning could I create for the beta distribution with $\alpha=\beta=\frac{1}{2}$?
I am not looking for a plot. As I said earlier, I know how to work with a beta distribution mathematically (plot it, calculate it, etc.) I am just trying to get some natural intuition.
Best Answer
Here is a frivolous example that may have some intuitive value.
In US Major League Baseball each team plays 162 games per season. Suppose a team is equally likely to win or lose each of its games. What proportion of the time will such a team have more wins than losses? (In order to have symmetry, if a team's wins and losses are tied at any point, we say it is ahead if it was ahead just before the tie occurred, otherwise behind.)
Suppose we look at a team's win-loss record as the season progresses. For our team with wins and losses are as if determined by tosses of a fair coin, you might think a team would most likely be ahead about half the time throughout a season. Actually, half the time is the least likely proportion of time for being ahead.
The "bathtub shaped" histogram below shows the approximate distribution of the proportion of time during a season that such a team is ahead. The curve is the PDF of $\mathsf{Beta}(.5,.5).$ The histogram is based on 20,000 simulated 162-game seasons for a team where wins and losses are like independent tosses of a fair coin, simulated in R as follows:
Note: Feller (Vol. 1) discusses such a process. The CDF of $\mathsf{Beta}(.5,.5)$ is a constant multiple of an arcsine function, so Feller calls it an 'Arcsine Law'.