Beta Distribution Parameters – How to Select Alpha and Beta Based on Mode and Credible Interval

bayesianbeta distributionpriorr

I want to a select both alpha and beta values using both mode and a 95% credible interval simultaneously. I can do select alpha and beta, using either a specified 95% credible interval, or a mode in conjunction with beta, but using both mode and 95% credible interval would be helpful to further refine my beta distribution.

I am after a beta distribution with a 95% credible interval from 0.01 to 0.15, and a mode of 0.05, to set up a prior distribution.

I can select parameters using mode the 95% CI with this R code:

library(LearnBayes)
quantile1=list(p=.025, x=0.01)     # 2.5% quantile should be 0.01
quantile2=list(p=.975, x=0.15)      # 97.5% quantile should be 0.15
beta.select(quantile1, quantile2)
[1]  2.44 38.21

I can select alpha, for a given mode and beta, using the following R code, by reorganizing the equation mode = (α – 1) / (α + β – 2):

# Select beta and mode - does not work if alpha + beta = 2
beta <- 38.21
mod <- 0.05
alpha <- beta * mod / (1 - mod) + (-2 * mod + 1) / (1 - mod)
alpha
[1] 2.958421

Note the change in value of alpha, as the mode with alpha = 2.44 and beta = 38.21 is not 0.05 (it's 0.037257439).

Best Answer

This is an optimisation problem to match a distribution to 3 parameters: 2 quantiles and the mode.

In fact, even the function you used earlier returns estimates from an optimisation. If you calculate the quantile values from those $\alpha, \beta$ parameters it gave you, you'll see they don't match up exactly:

> qbeta(p=c(0.025, 0.975), shape1=2.44, shape2=38.21)
[1] 0.01004775 0.14983899

We'll define an objective function that calculates the squared error between the known and optimised quantile values and the mode, for a given set of parameters $\alpha,\beta$ of the distribution (encoded in the params vector):

objective.function <- function(params) {
  alpha <- params[1]
  beta <- params[2]

  intended.quantiles <- c(0.01, 0.15)
  calculated.quantiles <- qbeta(p=c(0.025, 0.975), shape1=alpha, shape2=beta)
  squared.error.quantiles <- sum((intended.quantiles - calculated.quantiles)^2)

  intended.mode <- 0.05
  calculated.mode <- calculate.mode(alpha, beta)
  squared.error.mode <- (intended.mode - calculated.mode)^2

  return(squared.error.quantiles + squared.error.mode)
}

calculate.mode <- function(alpha, beta) {
  return((alpha-1) / (alpha+beta-2))
}

You already have some good starting values for $\alpha, \beta$, so let's get those ready:

starting.params <- c(2.44, 38.21)

Incidentally, this is what the PDF of a Beta distribution parameterised with these starting estimates looks like. Red lines are actual quantiles & mode, and blue lines are what you are trying to match. As you suggest, the quantiles look fine but the mode is off:

Now we use the nlm optimisation function to estimate optimal values $\alpha^*, \beta^*$, starting for those initial values. The algorithm converges:

nlm.result <- nlm(f = objective.function, p = starting.params)
optimal.alpha <- nlm.result$estimate[1]
optimal.beta <- nlm.result$estimate[2]

So the optimised estimates are:

$\alpha^* = 3.174725$

$\beta^* = 44.94454$

And the quantiles and mode corresponding to these optimal parameters are:

> qbeta(p=c(0.025, 0.975), shape1=optimal.alpha, shape2=optimal.beta)
[1] 0.01499578 0.15042877
> calculate.mode(optimal.alpha, optimal.beta)
[1] 0.04715437

Recreating the plot of the Beta PDF, this time with the new parameters, we can see that the mode is much better matched, at the expense of the lower $2.5\%$ quantile:

Possible enhancements:

This is a quick implementation that may not deal well with extreme values and numerical issues. Have a look at this answer for better code in the case of 2 quantiles only.
Investigate constrained optimisation packages, where the problem would be formulated as estimating 2 parameters ($\alpha, \beta$) subject to the constraint $\frac{\alpha - 1}{\alpha + \beta - 2} = 0.05$

Related Solutions

Beta Distribution – How to Numerically Approximate Values for a Beta Distribution with Large Alpha & Beta?

A Normal approximation works extremely well, especially in the tails. Use a mean of $\alpha/(\alpha+\beta)$ and a variance of $\frac{\alpha\beta}{(\alpha+\beta)^{2} (1+\alpha+\beta)}$. For example, the absolute relative error in the tail probability in a tough situation (where skewness might be of concern) such as $\alpha = 10^6, \beta = 10^8$ peaks around $0.00026$ and is less than $0.00006$ when you're more than 1 SD from the mean. (This is not because beta is so large: with $\alpha = \beta = 10^6$, the absolute relative errors are bounded by $0.0000001$.) Thus, this approximation is excellent for essentially any purpose involving 99% intervals.

In light of the edits to the question, note that one does not compute beta integrals by actually integrating the integrand: of course you'll get underflows (although they don't really matter, because they don't contribute appreciably to the integral). There are many, many ways to compute the integral or approximate it, as documented in Johnson & Kotz (Distributions in Statistics). An online calculator is found at http://www.danielsoper.com/statcalc/calc37.aspx. You actually need the inverse of this integral. Some methods to compute the inverse are documented on the Mathematica site at http://functions.wolfram.com/GammaBetaErf/InverseBetaRegularized/. Code is provided in Numerical Recipes. A really nice online calculator is the Wolfram Alpha site (www.wolframalpha.com ): enter inverse beta regularized (.005, 1000000, 1000001) for the left endpoint and inverse beta regularized (.995, 1000000, 1000001) for the right endpoint ($\alpha=1000000, \beta=1000001$, 99% interval).

Solved – How to compute lower edge of a credible interval for a beta distribution in python

Friend of mine answered this: I want the inverse cumulative distribution, i.e. the place on the X axis where the cumulative distribution is at 0.05.

In SciPy, the easiest thing is btdtri.

print btdtri(5257, 579112, 0.05)

gives

0.0087938241764

Best Answer

Related Solutions

Beta Distribution – How to Numerically Approximate Values for a Beta Distribution with Large Alpha & Beta?

Solved – How to compute lower edge of a credible interval for a beta distribution in python

Related Question