Beta Distribution Parameters – How to Select Alpha and Beta Based on Mode and Credible Interval

bayesianbeta distributionpriorr

I want to a select both alpha and beta values using both mode and a 95% credible interval simultaneously. I can do select alpha and beta, using either a specified 95% credible interval, or a mode in conjunction with beta, but using both mode and 95% credible interval would be helpful to further refine my beta distribution.

I am after a beta distribution with a 95% credible interval from 0.01 to 0.15, and a mode of 0.05, to set up a prior distribution.

I can select parameters using mode the 95% CI with this R code:

library(LearnBayes)
quantile1=list(p=.025, x=0.01)     # 2.5% quantile should be 0.01
quantile2=list(p=.975, x=0.15)      # 97.5% quantile should be 0.15
beta.select(quantile1, quantile2)
[1]  2.44 38.21

I can select alpha, for a given mode and beta, using the following R code, by reorganizing the equation mode = (α – 1) / (α + β – 2):

# Select beta and mode - does not work if alpha + beta = 2
beta <- 38.21
mod <- 0.05
alpha <- beta * mod / (1 - mod) + (-2 * mod + 1) / (1 - mod)
alpha
[1] 2.958421

Note the change in value of alpha, as the mode with alpha = 2.44 and beta = 38.21 is not 0.05 (it's 0.037257439).

Best Answer

This is an optimisation problem to match a distribution to 3 parameters: 2 quantiles and the mode.

In fact, even the function you used earlier returns estimates from an optimisation. If you calculate the quantile values from those $\alpha, \beta$ parameters it gave you, you'll see they don't match up exactly:

> qbeta(p=c(0.025, 0.975), shape1=2.44, shape2=38.21)
[1] 0.01004775 0.14983899

We'll define an objective function that calculates the squared error between the known and optimised quantile values and the mode, for a given set of parameters $\alpha,\beta$ of the distribution (encoded in the params vector):

objective.function <- function(params) {
  alpha <- params[1]
  beta <- params[2]

  intended.quantiles <- c(0.01, 0.15)
  calculated.quantiles <- qbeta(p=c(0.025, 0.975), shape1=alpha, shape2=beta)
  squared.error.quantiles <- sum((intended.quantiles - calculated.quantiles)^2)

  intended.mode <- 0.05
  calculated.mode <- calculate.mode(alpha, beta)
  squared.error.mode <- (intended.mode - calculated.mode)^2

  return(squared.error.quantiles + squared.error.mode)
}

calculate.mode <- function(alpha, beta) {
  return((alpha-1) / (alpha+beta-2))
}

You already have some good starting values for $\alpha, \beta$, so let's get those ready:

starting.params <- c(2.44, 38.21)

Incidentally, this is what the PDF of a Beta distribution parameterised with these starting estimates looks like. Red lines are actual quantiles & mode, and blue lines are what you are trying to match. As you suggest, the quantiles look fine but the mode is off:

enter image description here

Now we use the nlm optimisation function to estimate optimal values $\alpha^*, \beta^*$, starting for those initial values. The algorithm converges:

nlm.result <- nlm(f = objective.function, p = starting.params)
optimal.alpha <- nlm.result$estimate[1]
optimal.beta <- nlm.result$estimate[2]

So the optimised estimates are:

$\alpha^* = 3.174725$

$\beta^* = 44.94454$

And the quantiles and mode corresponding to these optimal parameters are:

> qbeta(p=c(0.025, 0.975), shape1=optimal.alpha, shape2=optimal.beta)
[1] 0.01499578 0.15042877
> calculate.mode(optimal.alpha, optimal.beta)
[1] 0.04715437

Recreating the plot of the Beta PDF, this time with the new parameters, we can see that the mode is much better matched, at the expense of the lower $2.5\%$ quantile:

enter image description here

Possible enhancements:

  • This is a quick implementation that may not deal well with extreme values and numerical issues. Have a look at this answer for better code in the case of 2 quantiles only.
  • Investigate constrained optimisation packages, where the problem would be formulated as estimating 2 parameters ($\alpha, \beta$) subject to the constraint $\frac{\alpha - 1}{\alpha + \beta - 2} = 0.05$