Solved – Credible set for beta distribution

beta distributioncredible-interval

How can I approximate beta distribution $\alpha=x+1$ and $\beta = 14-x$ to normal distribution? Or, could you please tell me how to calculate HPD credible set for beta distribution?

Best Answer

A level $\alpha$ "Highest Posterior Density" (HPD) interval for a (posterior) distribution $F$ with (continuous) density $f$ and mode $\mu$ is an interval $I=[x,y]$ containing $\mu$ for which

$1-\alpha$ of the probability is in the interval: $F(I) = F(y) - F(x) = 1-\alpha$.
The densities are the same at either end: $f(x) = f(y)$.

Among the various strategies to find $I$, one that stands out as generally effective is the following.

Choose a reasonable starting value for $x$, such as $F^{-1}(\alpha/2)$.
Define the "$\alpha$ offset" of $x$ to be the point $y$ for which $[x,y]$ has probability $1-\alpha$. Thus

$$y = F^{-1}(F(x) + 1 - \alpha)$$

provided $F(x) \le \alpha$.
Search for $x$ in the interval $(-\infty, F^{-1}(\alpha))$ at which $f(x) = f(y)$. The unimodality of $F$ and the continuity of $f$ guarantee such an $x$ exists and is unique.

The search (3) can be carried out in practice by minimizing $(f(y)-f(x))^2$ plus a penalty term in case the probability of $[x,y]$ is not exactly $1-\alpha$. (The penalty term is useful in case the search procedure provides a candidate value of $x$ for which $F(x)$ exceeds $\alpha$, in which case a valid offset $y$ cannot be found.)

Applying such a general procedure would be a better idea for a Beta distribution compared to using a Normal approximation, because Betas tend to be skewed (unless their parameters $a$ and $b$ are relatively similar).

For example, the orange region in the figure covers a $1-0.05$ HPD interval for a Beta$(11,4)$ distribution whose density is graphed. The dashed gray line shows the common value of the density at the endpoints.

Here is the R code that performed the calculation. It is written to be very general: if you can supply functions to compute $F$, $f$, and $F^{-1}$, it will work. (An example for Normal distributions has been commented out.)

offset <- function(x, alpha=0.05, F=pbeta, F.Inv=qbeta, ...) {
  q <- F(x, ...)
  y <- F.Inv(min(q + 1-alpha, 1), ...)
}
f <- function(x, alpha=0.05, F=pbeta, F.Inv=qbeta, f.dist=dbeta, ...) {
  y <- offset(x, alpha, F=F, F.Inv=F.Inv, ...)
  mapply(function(u,v) diff(f.dist(c(u,v), ...))^2, x, y) +
    (diff(F(c(x,y), ...)) - (1-alpha))^2
}
#
# Specify the problem.
#
alpha <- 0.05    # Level of the CI (between 0 and 1)
a <- 11; b <- 4  # Parameters
F <- pbeta       # Distribution function
F.Inv <- qbeta   # Inverse distribution function
f.dist <- dbeta  # Density function
x.min <- 0       # Minimum to consider
x.max <- 1       # Maximum to consider
# F <- pnorm
# F.Inv <- qnorm
# f.dist <- dnorm
# x.min <- -2
# x.max <- 5
#
# Find the solution.
#
x.0 <- qbeta(alpha/2, a, b)
x.lim <- qbeta(alpha, a, b)
sol <- optimize(function(x) f(x, alpha, F=F, F.Inv=F.Inv, f.dist=f.dist, a, b), 
                interval=c(x.min, x.lim), tol=1e-10)
x <- sol$minimum
y <- offset(x, alpha, F=F, F.Inv=F.Inv, a, b)
#
# Plot the solution.
#
u <- seq(x, y, length.out=1001)
v <- f.dist(u, a, b)
u <- c(u[1], u, u[length(u)])
v <- c(0, v, 0)
curve(f.dist(x, a, b), xlim=c(x.min, x.max), ylim=c(0,max(v)*1.2), n=1001, xlab="X", ylab="Density")
polygon(u, v, col="#f8e0a0", border=NA)
abline(h = f.dist(x, a, b), col="Gray", lty=2, lwd=2)
curve(f.dist(x, a, b), add=TRUE, lwd=2)

Related Solutions

Beta Distribution – How to Numerically Approximate Values for a Beta Distribution with Large Alpha & Beta?

A Normal approximation works extremely well, especially in the tails. Use a mean of $\alpha/(\alpha+\beta)$ and a variance of $\frac{\alpha\beta}{(\alpha+\beta)^{2} (1+\alpha+\beta)}$. For example, the absolute relative error in the tail probability in a tough situation (where skewness might be of concern) such as $\alpha = 10^6, \beta = 10^8$ peaks around $0.00026$ and is less than $0.00006$ when you're more than 1 SD from the mean. (This is not because beta is so large: with $\alpha = \beta = 10^6$, the absolute relative errors are bounded by $0.0000001$.) Thus, this approximation is excellent for essentially any purpose involving 99% intervals.

In light of the edits to the question, note that one does not compute beta integrals by actually integrating the integrand: of course you'll get underflows (although they don't really matter, because they don't contribute appreciably to the integral). There are many, many ways to compute the integral or approximate it, as documented in Johnson & Kotz (Distributions in Statistics). An online calculator is found at http://www.danielsoper.com/statcalc/calc37.aspx. You actually need the inverse of this integral. Some methods to compute the inverse are documented on the Mathematica site at http://functions.wolfram.com/GammaBetaErf/InverseBetaRegularized/. Code is provided in Numerical Recipes. A really nice online calculator is the Wolfram Alpha site (www.wolframalpha.com ): enter inverse beta regularized (.005, 1000000, 1000001) for the left endpoint and inverse beta regularized (.995, 1000000, 1000001) for the right endpoint ($\alpha=1000000, \beta=1000001$, 99% interval).

Solved – How to compute lower edge of a credible interval for a beta distribution in python

Friend of mine answered this: I want the inverse cumulative distribution, i.e. the place on the X axis where the cumulative distribution is at 0.05.

In SciPy, the easiest thing is btdtri.

print btdtri(5257, 579112, 0.05)

gives

0.0087938241764

Best Answer

Related Solutions

Beta Distribution – How to Numerically Approximate Values for a Beta Distribution with Large Alpha & Beta?

Solved – How to compute lower edge of a credible interval for a beta distribution in python

Related Question