Solved – pymc3: acceptance probabilities and divergencies after tuning

logisticpymcregression

I coded two models in pymc3, which I thought are quite simple.

Logistic Regression

The first is a logistic regression in an experiment that models correct and wrong answers for specific tasks in a 2×2 factorial design. Here's what it looks like:

with pm.Model() as m1:
    b_spe = pm.Normal('b_spe', 0, 5)
    b_noi = pm.Normal('b_noi', 0, 5)
    b_sn = pm.Normal('b_sn', 0, 5)
    sigma_sub = pm.HalfCauchy('sigma_sub', 1)
    sigma_run = pm.HalfCauchy('sigma_run', 1)

    a = pm.Normal('a', 0, 5)

    a_sub = pm.Normal('a_sub', 0, sigma_sub, shape=len(d['sub'].unique()))
    a_run = pm.Normal('a_run', 0, sigma_run, shape=len(d['run'].unique()))    

    p = pm.invlogit(a + a_sub[d["sub"].values] + a_run[d["run"].values] + b_spe*d.speech 
                    + b_noi * d.noise + b_sn * d.speech * d.noise)
    hits = pm.Binomial('hits', p=p, n=d.total, observed=d.hit)

    trace_m1 = pm.sample(5000, tune=5000, njobs=4, target_accept=.99)

After sampling I get the following warning:

Sampling 4 chains: 100%|██████████| 40000/40000 [03:38<00:00,
37.86draws/s] The acceptance probability does not match the target. It is 0.8942185942514685, but should be close to 0.8. Try to increase the
number of tuning steps. The number of effective samples is smaller
than 25% for some parameters.

Here are the traceplots:
traceplots of logistic regression

Rhat(a) = 1.0002822268704348

The traceplots look fine, at least to me, and r-hat is very close to 1. Is there a problem with the model? What can I do to fix it?

Linear Regression

This is related to the experiment above, but I use fMRI data that I model with Student's T-distributions for the longer tails. It's still a 2×2 factorial design, and I also model the covariance between intercepts and slopes of each subject using an LKJ covariance matrix. Furthermore, the model is off-center, inspired by the models of chapter 13.23 from Statistical Rethinking.

with pm.Model() as m2:
    sd_dist = pm.HalfCauchy.dist(beta=2)
    pchol_sub = pm.LKJCholeskyCov('pchol_sub', eta=4, n=n_dim, sd_dist=sd_dist)
    chol = pm.expand_packed_triangular(n_dim, pchol_sub, lower=True)    

    a = pm.Normal('a', 0, 1)

    sigma_sub = pm.HalfCauchy('sigma_sub', 1)
    a_sub = pm.Normal('a_sub', 0, sigma_sub, shape=n_subs)  

    # includes beta for spe, noi, sn
    b_spe = pm.StudentT('b_spe', nu=3, mu=0, lam=1)
    b_noi = pm.StudentT('b_noi', nu=3, mu=0, lam=1)
    b_sn = pm.StudentT('b_sn', nu=3, mu=0, lam=1)

    # per subject
    b_s_sub = tt.dot(chol, pm.StudentT('b_s_sub', nu=3, mu=0, lam=1, shape=(1, n_subs)))
    b_n_sub = tt.dot(chol, pm.StudentT('b_n_sub', nu=3, mu=0, lam=1, shape=(1, n_subs)))
    b_sn_sub = tt.dot(chol, pm.StudentT('b_sn_sub', nu=3, mu=0, lam=1, shape=(1, n_subs)))

    A = a + a_sub[d['sub'].values]
    B_s = b_spe + b_s_sub[0, d['sub'].values]
    B_n = b_noi + b_n_sub[0, d['sub'].values]
    B_sn = b_sn + b_sn_sub[0, d['sub'].values]

    sigma = pm.HalfCauchy('sigma', 2)
    lam = sigma ** -2
    nu = pm.Exponential('ν_minus_one', 1/29.) + 1
    mu = pm.Deterministic('mu', A + B_s * d.speech + B_n * d.noise + B_sn * d.speech * d.noise)
    psc = pm.StudentT('psc', nu=nu, mu=mu, lam=lam, observed=d.psc_c)

    trace_m2 = pm.sample(5000, tune=5000, njobs=4, target_accept=0.99)

Here are the warnings/erros I received:

Sampling 4 chains: 100%|██████████| 40000/40000 [04:55<00:00,
135.45draws/s] There were 2 divergences after tuning. Increase target_accept or reparameterize. ERROR:pymc3:There were 2
divergences after tuning. Increase target_accept or reparameterize.
The acceptance probability does not match the target. It is
0.7183184493954338, but should be close to 0.8. Try to increase the number of tuning steps.

There are other repeating warnings that pretty much say the same thing. For each chain I get warnings.

Below are the trace plots.

traceplots linear regression

If I zoom in between 2000:3000 I get funky looking traces:
zoomed in traceplots

Here's an example pairplot between sigma_sub and a_sub_7. The red dots are the divergent points. I was inspired by the post by Thomas Wiecki.

pairplot linear regression

Rhat(a) = 1.0014072110125143

Again, the traces look fine to me (except the zoomed in part) and r-hat is very close to 1. What am I doing wrong?

The data is quite noisy for this model, and the sample size is 17, not huge.

Would really appreciate some help here 🙂
Thanks!

Edit:
Using pm.sample(..., nuts_kwargs=dict(target_accept=0.95)) got rid of the divergencies.

Best Answer

You might have better luck on our discourse: https://discourse.pymc.io/

A couple of notes: You need to use: pm.sample(..., nuts_kwargs=dict(target_accept=0.95)) instead to have the target_accept be applied (this behavior is changed in line with what you tried in the next release). That might already solve your problem.

If not, these problems do not look too terrible so you're probably fine. If you still wanted to investigate this further I would look where exactly the divergences appear. Arviz has some better plots for this, see here for one example: https://arviz-devs.github.io/arviz/examples/plot_parallel.html or https://arviz-devs.github.io/arviz/examples/plot_pair.html. Basically you want to find the dimensions where the divergences cluster. That tells you where the problem is.