Solved – Pymc3 – Sampling from a categorical distribution

categorical datapymc

I've been experimenting with PyMC3 – I've used it for building regression models before, but I want to better understand how to deal with categorical data.

However, I think I'm misunderstanding how the Categorical distribution is meant to be used in PyMC. In order to test out using the distribution, I'm using the Categorical distribution to simulate a biased coin. When I run the following code:

“`

import pymc3

with pymc3.Model() as model:
    category = pymc3.Categorical(name='category',
                                 p=np.array([0.25]))
    trace = pymc3.sample(20, step=pymc3.Metropolis())
print(trace['category'])

“`

I expect the trace to consist of numbers from the set {0, 1}, where the values are sampled from a Bernoulli distribution with p = 0.25.

However, the code above prints the following:

[ 0 -1 -2 -2 -2 -3 -4 -4 -4 -5 -5 -6 -7 -7 -6 -8 -8 -7 -6 -6]

It seems like I am misunderstanding something, as these numbers are not even in the support of the distribution that I am attempting to simulate.

Am I mistaken about the format that p takes? Am I accessing the results incorrectly? Help me understand what's going on here. Thanks in advance for the help!

Best Answer

Use the BinaryMetropolis step method with p=np.array([0.25, 0.75]) and it shoud work.