Solved – Help me understand Bayesian prior and posterior distributions

bayesiandistributionsposteriorprior

In a group of students, there are 2 out of 18 that are left-handed. Find the posterior distribution of left-handed students in the population assuming uninformative prior. Summarize the results. According to the literature 5-20% of people are left-handed. Take this information into account in your prior and calculate new posterior.

I know the beta distribution should be used here. First, with $\alpha$ and $\beta$ values as 1? The equation I found in the material for posterior is

$$\pi(r \vert Y ) \propto r^{(Y +−1)} \times (1 − r)^{(N−Y +−1)} \\
$$

$Y=2$, $N=18$

Why is that $r$ in the equation? ($r$ denoting the proportion of left-handed people). It is unknown, so how can it be in this equation? To me it seems ridiculous to calculate $r$ given $Y$ and use that $r$ in the equation giving $r$. Well, with the sample $r=2/18$ the result was $0,0019$. The $f$ should I deduce from that?

The equation giving an expected value of $R$ given known $Y$ and $N$ worked better and gave me $0,15$ which sounds about right. The equation being $E(r | X, N, α, β) = (α + X)/(α + β + N)$ with value $1$ assigned to $α$ and $β$. What values should I give $α$ and $β$ to take into account prior information?

Some tips would be much appreciated. A general lecture on prior and posterior distributions wouldn't hurt either (I have vague understanding what they are but only vague) Also bear in mind I'm not very advanced statistician (actually I'm a political scientist by my main trade) so advanced mathematics will probably fly over my head.

Best Answer

Let me first explain what a conjugate prior is. I will then explain the Bayesian analyses using your specific example. Bayesian statistics involve the following steps:

  1. Define the prior distribution that incorporates your subjective beliefs about a parameter (in your example the parameter of interest is the proportion of left-handers). The prior can be "uninformative" or "informative" (but there is no prior that has no information, see the discussion here).
  2. Gather data.
  3. Update your prior distribution with the data using Bayes' theorem to obtain a posterior distribution. The posterior distribution is a probability distribution that represents your updated beliefs about the parameter after having seen the data.
  4. Analyze the posterior distribution and summarize it (mean, median, sd, quantiles, ...).

The basis of all bayesian statistics is Bayes' theorem, which is

$$ \mathrm{posterior} \propto \mathrm{prior} \times \mathrm{likelihood} $$

In your case, the likelihood is binomial. If the prior and the posterior distribution are in the same family, the prior and posterior are called conjugate distributions. The beta distribution is a conjugate prior because the posterior is also a beta distribution. We say that the beta distribution is the conjugate family for the binomial likelihood. Conjugate analyses are convenient but rarely occur in real-world problems. In most cases, the posterior distribution has to be found numerically via MCMC (using Stan, WinBUGS, OpenBUGS, JAGS, PyMC or some other program).

If the prior probability distribution does not integrate to 1, it is called an improper prior, if it does integrate to 1 it is called a proper prior. In most cases, an improper prior does not pose a major problem for Bayesian analyses. The posterior distribution must be proper though, i.e. the posterior must integrate to 1.

These rules of thumb follow directly from the nature of the Bayesian analysis procedure:

  • If the prior is uninformative, the posterior is very much determined by the data (the posterior is data-driven)
  • If the prior is informative, the posterior is a mixture of the prior and the data
  • The more informative the prior, the more data you need to "change" your beliefs, so to speak because the posterior is very much driven by the prior information
  • If you have a lot of data, the data will dominate the posterior distribution (they will overwhelm the prior)

An excellent overview of some possible "informative" and "uninformative" priors for the beta distribution can be found in this post.

Say your prior beta is $\mathrm{Beta}(\pi_{LH}| \alpha, \beta)$ where $\pi_{LH}$ is the proportion of left-handers. To specify the prior parameters $\alpha$ and $\beta$, it is useful to know the mean and variance of the beta distribution (for example, if you want your prior to have a certain mean and variance). The mean is $\bar{\pi}_{LH}=\alpha/(\alpha + \beta)$. Thus, whenever $\alpha =\beta$, the mean is $0.5$. The variance of the beta distribution is $\frac{\alpha\beta}{(\alpha + \beta)^{2}(\alpha + \beta + 1)}$. Now, the convenient thing is that you can think of $\alpha$ and $\beta$ as previously observed (pseudo-)data, namely $\alpha$ left-handers and $\beta$ right-handers out of a (pseudo-)sample of size $n_{eq}=\alpha + \beta$. The $\mathrm{Beta}(\pi_{LH} |\alpha=1, \beta=1)$ distribution is the uniform (all values of $\pi_{LH}$ are equally probable) and is the equivalent of having observed two people out of which one is left-handed and one is right-handed.

The posterior beta distribution is simply $\mathrm{Beta}(z + \alpha, N - z +\beta)$ where $N$ is the size of the sample and $z$ is the number of left-handers in the sample. The posterior mean of $\pi_{LH}$ is therefore $(z + \alpha)/(N + \alpha + \beta)$. So to find the parameters of the posterior beta distribution, we simply add $z$ left-handers to $\alpha$ and $N-z$ right-handers to $\beta$. The posterior variance is $\frac{(z+\alpha)(N-z+\beta)}{(N+\alpha+\beta)^{2}(N + \alpha + \beta + 1)}$. Note that a highly informative prior also leads to a smaller variance of the posterior distribution (the graphs below illustrate the point nicely).

In your case, $z=2$ and $N=18$ and your prior is the uniform which is uninformative, so $\alpha = \beta = 1$. Your posterior distribution is therefore $Beta(3, 17)$. The posterior mean is $\bar{\pi}_{LH}=3/(3+17)=0.15$. Here is a graph that shows the prior, the likelihood of the data and the posterior

The prior, the likelihood of the data and the posterior distribution with a uniform prior

You see that because your prior distribution is uninformative, your posterior distribution is entirely driven by the data. Also plotted is the highest density interval (HDI) for the posterior distribution. Imagine that you put your posterior distribution in a 2D-basin and start to fill in water until 95% of the distribution are above the waterline. The points where the waterline intersects with the posterior distribution constitute the 95%-HDI. Every point inside the HDI has a higher probability than any point outside it. Also, the HDI always includes the peak of the posterior distribution (i.e. the mode). The HDI is different from an equal tailed 95% credible interval where 2.5% from each tail of the posterior are excluded (see here).

For your second task, you're asked to incorporate the information that 5-20% of the population are left-handers into account. There are several ways of doing that. The easiest way is to say that the prior beta distribution should have a mean of $0.125$ which is the mean of $0.05$ and $0.2$. But how to choose $\alpha$ and $\beta$ of the prior beta distribution? First, you want your mean of the prior distribution to be $0.125$ out of a pseudo-sample of equivalent sample size $n_{eq}$. More generally, if you want your prior to have a mean $m$ with a pseudo-sample size $n_{eq}$, the corresponding $\alpha$ and $\beta$ values are: $\alpha = mn_{eq}$ and $\beta = (1-m)n_{eq}$. All you are left to do now is to choose the pseudo-sample size $n_{eq}$ which determines how confident you are about your prior information. Let's say you are very sure about your prior information and set $n_{eq}=1000$. The parameters of your prior distribution are thereore $\alpha = 0.125\cdot 1000 = 125$ and $\beta = (1 - 0.125)\cdot 1000 = 875$. The posterior distribution is $\mathrm{Beta}(127, 891)$ with a mean of about $0.125$ which is practically the same as the prior mean of $0.125$. The prior information is dominating the posterior (see the following graph):

The prior, the likelihood of the data and the posterior distribution with strong informative prior

If you are less sure about the prior information, you could set the $n_{eq}$ of your pseudo-sample to, say, $10$, which yields $\alpha=1.25$ and $\beta=8.75$ for your prior beta distribution. The posterior distribution is $\mathrm{Beta}(3.25, 24.75)$ with a mean of about $0.116$. The posterior mean is now near the mean of your data ($0.111$) because the data overwhelm the prior. Here is the graph showing the situation:

The prior, the likelihood of the data and the posterior distribution with beta prior corresponding to a pseudo-sample size of 3

A more advanced method of incorporating the prior information would be to say that the $0.025$ quantile of your prior beta distribution should be about $0.05$ and the $0.975$ quantile should be about $0.2$. This is equivalent of saying that your are 95% sure that the proportion of left-handers in the population lies between 5% and 20%. The function beta.select in the R package LearnBayes calculates the corresponding $\alpha$ and $\beta$ values of a beta distribution corresponding to such quantiles. The code is

library(LearnBayes)

quantile1=list(p=.025, x=0.05)     # the 2.5% quantile should be 0.05
quantile2=list(p=.975, x=0.2)      # the 97.5% quantile should be 0.2
beta.select(quantile1, quantile2)

[1]  7.61 59.13

It seems that a beta distribution with paramters $\alpha = 7.61$ and $\beta=59.13$ has the desired properties. The prior mean is $7.61/(7.61 + 59.13)\approx 0.114$ which is near the mean of your data ($0.111$). Again, this prior distribution incorporates the information of a pseudo-sample of an equivalent sample size of about $n_{eq}\approx 7.61+59.13 \approx 66.74$. The posterior distribution is $\mathrm{Beta}(9.61, 75.13)$ with a mean of $0.113$ which is comparable with the mean of the previous analysis using a highly informative $\mathrm{Beta}(125, 875)$ prior. Here is the corresponding graph:

The prior, the likelihood of the data and the posterior distribution with prior that has 0.05 and 0.975 quantiles of 0.05 and 0.2

See also this reference for a short but imho good overview of Bayesian reasoning and simple analysis. A longer introduction for conjugate analyses, especially for binomial data can be found here. A general introduction into Bayesian thinking can be found here. More slides concerning aspects of Baysian statistics are here.

Related Question