I have successfully used the following PyMC3 model to estimate the changing response probability in a binary choice task:
import numpy as np
import pymc3 as pm
import theano.tensor as t
def _tinvlogit(x):
return t.exp(x) / (1 + t.exp(x))
observations = np.array([1,1,1,1,1,0,1,1,1,1,1,1,0,1,1,0,1,1,1,1,0,0,1,
1,1,1,0,1,0,1,1,1,1,1,1,1,0,1,1,1,0,1,1,1,1,0,
0,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,1,1,0,1,0,0,0,
0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,
0,0,0,0,0,0,0,0,0])
Nobs = len(observations)
chance = 0.5
with pm.Model() as model:
tau1 = pm.Gamma('tau1', alpha=5, beta=5 )
init = np.log(chance/(1.-chance))
init = pm.Normal.dist(mu=init, sd=1)
x1 = pm.distributions.timeseries.GaussianRandomWalk('x1',
tau=tau1,
init=init,
shape=Nobs)
p = pm.Deterministic( 'p', _tinvlogit( x1 ) )
n = pm.Bernoulli( 'n', p=p, observed=observations )
traces = pm.sample( 5000, step=pm.HamiltonianMC(), progressbar=False, njobs=4 )
And now I would like to extend the model to multiple discrete responses. As a first step, I thought I would switch from Bernoulli to Categorical for the binary task:
p = pm.Deterministic( 'p', _tinvlogit( t.concatenate( [x1,[0.]] ) ) )
n = pm.Categorical( 'n', p=p, observed=observations )
However, the result is that all x1 and p samples have a constant value of 0. and 0.5 respectively. What is the correct way to build the model for a multiple choice task?
Best Answer
Another potential solution might be to split the multiple choice into binary choices. This can be done via one hot encoding. There is a function called get_dummies in the pandas library that can help with this. One you have the multiple choice in binary form, you can pick a distribution for each choice, for example a Bernoulli distribution.