bayesian – Troubleshooting Convergence in PyMC3 Beta-Binomial Models

bayesianbeta distributionbinomial distributionpymc

Something is not performing as expected with PyMC. I'm trying a simple Beta-Binomial conjugate prior model, trying to recover known parameters.

Control data

from scipy.stats import beta
import numpy as np
import seaborn as sns
def find_ab(x, search=range(2, 22)):
    avg, score, best_a, best_b = 1000, 1000, 1, 1
    for i in search:
        for j in search:
            samples = beta.rvs(i, j, size=1000)
            new_avg = np.mean(samples)
            new_score = abs(new_avg - x)
            if new_score < score:
                best_a, best_b = i,j
                avg = new_avg
                score = new_score
    return best_a, best_b, avg

ctrl = 0.61
a_ctrl, b_ctrl ,avg_ctrl = find_ab(ctrl)
print(f"a={a_ctrl}, b={b_ctrl}, avg={avg_ctrl}")
samples_ctrl = beta.rvs(a_ctrl, b_ctrl, size=10_000)
sns.kdeplot(samples_ctrl)

Test data

test = ctrl + 0.12
a_test, b_test ,avg_test = find_ab(test)
print(f"a={a_test}, b={b_test}, avg={avg_test}")
samples_test = beta.rvs(a_test, b_test, size=10_000)
sns.kdeplot(samples_test)

As you can see, control is centered on 0.61 and test on 0.73. I'll generate samples using Numpy, effectively 10,000 independent Bernoulli trials, each, with 0.61 and 0.73 as inputs, respectively for p.

ris_ctrl = np.random.binomial(n=1,p=avg_ctrl,size=10_000)
ris_test = np.random.binomial(n=1,p=avg_test,size=8_000)

And the modeling in PyMC3

import pymc as pm
with pm.Model() as model_ctrl:
    theta_ctrl = pm.Beta('theta', a_ctrl, b_ctrl)
    lik_ctrl = pm.Binomial('likelihood',p=theta_ctrl, n=len(ris_ctrl), observed=ris_ctrl)
    trace_ctrl = pm.sample(chains=4, draws=4000)
posterior_ctrl = trace_ctrl.posterior['theta'].stack(sample=("chain", "draw")).values
pm.plot_trace(trace_ctrl, compact=False)

with pm.Model() as model_test:
    theta_test = pm.Beta('theta', a_test, b_test)
    lik_test = pm.Binomial('likelihood',p=theta_test, n=len(ris_test), observed=ris_test)
    trace_test = pm.sample(chains=4, draws=4000)
posterior_test = trace_test.posterior['theta'].stack(sample=("chain", "draw")).values
pm.plot_trace(trace_test, compact=False)

But when I look at the traces plots, the ctrl and test have centered on 0.00061 and 0.00091, respectively. I have no idea why the means are magnitudes lower than the parameters that they should recover.

What is going on here? Is this an issue with how I've defined prior and likelihood or an issue with PyMC3?

I know that Beta-Binomial is a conjugate prior so this is unnecessary; but this is effectively why I chose this prior-likelihood as a test use case.

Best Answer

You specify the number of trials parameter, $n$, inconsistently.

First you simulate Bernoulli random variables with np.random.binomial(n=1, p=p, size=size). Then in the likelihood you have pm.Binomial("likelihood", p=theta, n=len(sample), observed=sample).

Replace n=len(sample) with n=1 to get the expected result.

You can do a back-of-the-envelope calculation to understand why you got the estimates that you report: 0.00061 ≈ 0.61 / 10_000 and 0.00091 ≈ 0.73 / 8_000.

Related Solutions

Solved – Convert probabilities of “binomial events” into beta distribution for binomial parameter (theta)

I have previously searched for quite a while on this topic in a slightly different context. I don’t think there is a “standard” answer. The best theory source I found was:

F. Groen, "Foundations of probabilistic inference with uncertain evidence", 2005: http://www.sciencedirect.com/science/article/pii/S0888613X04000830

Your question (I think) is about the posterior predictive distribution for X . To my mind the main parameter we need to estimate is theta - once we know this we can play tunes on the probability of data, and estimate predictive distributions of future data. So re-phrasing your question:

What is P(theta | D), given that D is uncertain?

It sounds like you are familiar with the standard Bayesian approach to solving this with integer data, so I won’t review this. Two possible (practical) approaches are given below.

Approach 1 - An ad hoc approach

An ad hoc solution to soft data is proceed as normal but to use the fractional values in the exponents of the binomial distribution, and use the gamma function to generalise the factorials in the combinatorial term to continuous values. i.e. allow r (and possibly n) to be fractional in the below equation.

$$P(r; n, \theta)= C_r^n\theta^r(1-\theta)^{n-r}$$

And use the gamma function for the combinatorial term (which reduces to factorials for integer values):

$$C_r^n=\frac{\Gamma(n+1)}{\Gamma(r+1)\Gamma(n+1-r)}$$

For the data you quote you would use r=1.45 and n=2. Please be cautious as this is definitely a "practical" approach - obviously the "binomial distribution with fractional values" isn't a probability distribution. However, if you follow it through then the posterior is still a proper probability distribution due to the normalisation integral. So it gives "sensible" results - i.e. answers that are bounded on either side by "proper" integer observations, and scale in a "reasonable" way in between.

Approach 2 - Weight each possibility

In the likelihood expression weight each possible outcome by the probability that it occurred. So for the likelihood term is of the form:

$$P(r; n, \theta)= \sum_{r=0}^{n} P(r){C_r^n\theta^r(1-\theta)^{n-r}}$$

The difficulty with this approach is that the number of possibilities expands (combinatorially) as you make more soft observations. However, since you only have a few data observations this shouldn’t be a problem for you. See section 5.1 and section 7.1 in the reference supplied above - in particular equations 20 and 76.

Solved – Finding the Poisson rate parameter with PyMC3

First of all, you have far too many chains for a problem like this. Second of all, your tuning parameter is far too high. Something along the lines of

trace = pm.sample(2000,chains = 4, tune = 1000)

Should be sufficient for a problem like this.

Remember that there is sampling error in the data. The mean of the generated data is rarely ever exactly 10. Since you are using a uniform prior, then the posterior mode is the maximum likelihood estimator (which for the poisson is the mean of the observations, if I recall correctly). There might be a little bias because you are using a prior which is not supported on the parameter's domain, but with this many observations, I would expect the bias (if it exists) to be small.

I see nothing wrong here, the behaviour is exactly as to be expected. HMC is not omniscient, and so expecting the posterior mode to be exactly the true parameter value is holding a very high standard. Instead, I would plot HPD intervals (with pm.plot_posterior(trace)) and determine if the true parameter lies within the interval.

Best Answer

Related Solutions

Solved – Convert probabilities of “binomial events” into beta distribution for binomial parameter (theta)

Solved – Finding the Poisson rate parameter with PyMC3

Related Question