Solved – PyMC3: how to model evolving probabilities of a categorical variable

pymc

I have successfully used the following PyMC3 model to estimate the changing response probability in a binary choice task:

import numpy as np
import pymc3 as pm
import theano.tensor as t

def _tinvlogit(x):    
    return t.exp(x) / (1 + t.exp(x))

observations = np.array([1,1,1,1,1,0,1,1,1,1,1,1,0,1,1,0,1,1,1,1,0,0,1,
                         1,1,1,0,1,0,1,1,1,1,1,1,1,0,1,1,1,0,1,1,1,1,0,
                         0,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,1,1,0,1,0,0,0,
                         0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,
                         0,0,0,0,0,0,0,0,0])

Nobs = len(observations)
chance = 0.5

with pm.Model() as model:

    tau1 = pm.Gamma('tau1', alpha=5, beta=5 )

    init = np.log(chance/(1.-chance))
    init = pm.Normal.dist(mu=init, sd=1)

    x1 = pm.distributions.timeseries.GaussianRandomWalk('x1', 
                                                       tau=tau1, 
                                                       init=init, 
                                                       shape=Nobs)

    p = pm.Deterministic( 'p', _tinvlogit( x1 ) )
    n = pm.Bernoulli( 'n', p=p, observed=observations )

    traces = pm.sample( 5000, step=pm.HamiltonianMC(), progressbar=False, njobs=4 )

And now I would like to extend the model to multiple discrete responses. As a first step, I thought I would switch from Bernoulli to Categorical for the binary task:

p = pm.Deterministic( 'p', _tinvlogit( t.concatenate( [x1,[0.]] ) ) )
n = pm.Categorical( 'n', p=p, observed=observations )

However, the result is that all x1 and p samples have a constant value of 0. and 0.5 respectively. What is the correct way to build the model for a multiple choice task?

Best Answer

Another potential solution might be to split the multiple choice into binary choices. This can be done via one hot encoding. There is a function called get_dummies in the pandas library that can help with this. One you have the multiple choice in binary form, you can pick a distribution for each choice, for example a Bernoulli distribution.