Solved – Finding the Poisson rate parameter with PyMC3

markov-chain-montecarlopymc

I'm trying to compute the rate parameter of fake set of poisson data, where I set the parameter.

When I run PyMC the posterior distribution always peaks around the true rate parameter, but never seems to hit it.

For example, in this scenario, the true rate parameter is 10 but the posterior peaks at 10.05.

enter image description here

What have I done wrong? Here's the code:

true_rate = 10
n_obs = 10,000
poisson_accidents = np.random.poisson(true_rate, 10000)

# fit a pymc3 model to get the rate
basic_model = pm.Model()

with basic_model:
    lambda_prior = pm.Uniform('lambda_prior',0,20)

    #likelihood
    Y_obs = pm.Poisson('Y_obs',mu = lambda_prior, observed = poisson_accidents)


with basic_model:
    trace = pm.sample(2000,cores = 1,chains = 10,tune = 5000)
    pm.traceplot(trace)
    plt.show()

EDIT

I have increased the number of iterations as suggested:

with basic_model:
    trace = pm.sample(10000,cores = 1,chains = 10,tune = 5000)
    pm.traceplot(trace)
    plt.show()

This still results in a peak that is not centered at 10:

enter image description here

EDIT 2, 15K samples, peaking at 9.93

enter image description here

Am I just being persnickety, or is there something wrong?

Best Answer

First of all, you have far too many chains for a problem like this. Second of all, your tuning parameter is far too high. Something along the lines of

trace = pm.sample(2000,chains = 4, tune = 1000)

Should be sufficient for a problem like this.

Remember that there is sampling error in the data. The mean of the generated data is rarely ever exactly 10. Since you are using a uniform prior, then the posterior mode is the maximum likelihood estimator (which for the poisson is the mean of the observations, if I recall correctly). There might be a little bias because you are using a prior which is not supported on the parameter's domain, but with this many observations, I would expect the bias (if it exists) to be small.

I see nothing wrong here, the behaviour is exactly as to be expected. HMC is not omniscient, and so expecting the posterior mode to be exactly the true parameter value is holding a very high standard. Instead, I would plot HPD intervals (with pm.plot_posterior(trace)) and determine if the true parameter lies within the interval.