I've been experimenting with PyMC3 – I've used it for building regression models before, but I want to better understand how to deal with categorical data.
However, I think I'm misunderstanding how the Categorical distribution is meant to be used in PyMC. In order to test out using the distribution, I'm using the Categorical distribution to simulate a biased coin. When I run the following code:
“`
import pymc3
with pymc3.Model() as model:
category = pymc3.Categorical(name='category',
p=np.array([0.25]))
trace = pymc3.sample(20, step=pymc3.Metropolis())
print(trace['category'])
“`
I expect the trace to consist of numbers from the set {0, 1}, where the values are sampled from a Bernoulli distribution with p = 0.25.
However, the code above prints the following:
[ 0 -1 -2 -2 -2 -3 -4 -4 -4 -5 -5 -6 -7 -7 -6 -8 -8 -7 -6 -6]
It seems like I am misunderstanding something, as these numbers are not even in the support of the distribution that I am attempting to simulate.
Am I mistaken about the format that p
takes? Am I accessing the results incorrectly? Help me understand what's going on here. Thanks in advance for the help!
Best Answer
Use the
BinaryMetropolis
step method withp=np.array([0.25, 0.75])
and it shoud work.