Solved – Regression Mixture in PYMC3

bayesianpymc

I'm attempting a problem where I have a mixture of regression coefficients. Not sure if my math or my coding is bad, but I'm getting wrong estimates for the coefficients, which should be 5 and -5. I originally tried this with three regression lines and had even more issues, but for now I would be content to make it work with two. I'm getting more like 1.5 and -1.5 for by betas with a sigma parameter around 5–the true values are not even in the credible regions.

##Fake data
%matplotlib inline 
import numpy as np
import matplotlib.pyplot as plt
import pymc3 as pm
import seaborn as sns

np.random.seed(123)
alpha = 0
sigma = 1
beta = [-5]
beta2 = [5]
size = 250

# Predictor variable
X1_1 = np.random.randn(size)

# Simulate outcome variable--cluster 1
Y1= alpha + beta[0]*X1_1 +  np.random.normal(loc=0, scale=sigma, size=size)

# Predictor variable
X1_2 = np.random.randn(size)
# Simulate outcome variable --cluster 2
Y2 = alpha + beta2[0]*X1_2 + np.random.normal(loc=0, scale=sigma, size=size)


X1 = np.append(X1_1, X1_2)
Y = np.append(Y1,Y2)

And here's the model:

basic_model = pm.Model()

with basic_model:    
    p = pm.Uniform('p', 0, 1) #Proportion in each mixture

    alpha  = pm.Normal('alpha', mu=0, sd=10) #Intercept
    beta_1 = pm.Normal('beta_1', mu=0, sd=100, shape=2)  #Betas.  Two of them.
    sigma  = pm.Uniform('sigma', 0, 20)  #Noise

    category = pm.Bernoulli('category', p=p, shape=size*2)  #Classification of each observation

    b1 = pm.Deterministic('b1', beta_1[category])  #Choose beta based on category

    mu = alpha + b1*X1 # Expected value of outcome

    # Likelihood 
    Y_obs = pm.Normal('Y_obs', mu=mu, sd=sigma, observed=Y)
with basic_model:
    step1 = pm.Metropolis([p, alpha, beta_1, sigma])
    step2 = pm.BinaryMetropolis([category])
    trace = pm.sample(20000, [step1, step2], progressbar=True)
pm.traceplot(trace)

In this plot, I'm expecting dark points for one mixture, light from the other. It doesn't have the certainty expecting on most of the points:

p_cat = np.apply_along_axis(np.mean, 0, trace['category'])
fig, axes = plt.subplots(1,1, figsize=(10,4))
axes.scatter(X1, Y, c=p_cat)

axes.set_ylabel('Y'); axes.set_xlabel('X1');

EDIT: I've attempted the same model in pymc as follows:

import pymc as mc
p = mc.Uniform('p', 0, 1, value=.5) #Proportion in each mixture

alpha  = mc.Normal('alpha', mu=0, tau=1./10, value=0) #Intercept
beta_1 = mc.Normal('beta_1', mu=0, tau=1, size=2, value=[0,0])  #Betas.  Two of them.
sigma  = mc.Uniform('sigma', 0, 20)  #Noise

category = mc.Bernoulli('category', p=p, size=500)  #Classification of each observation


@mc.deterministic 
def b1(beta_1 = beta_1, category=category):
    return np.choose(category, beta_1)

@mc.deterministic
def mu(alpha=alpha, b1=b1):
    return alpha + b1*X1

@mc.deterministic
def tau(sigma=sigma):
    return 1.0/sigma

    # Likelihood 
Y_obs = mc.Normal('Y_obs', mu=mu, tau=tau, observed=True, value=Y)
model = mc.Model([p,alpha, beta_1, sigma, category, Y_obs])
mcmc = mc.MCMC(model)
mcmc.sample(10000)
p_cat = np.apply_along_axis(np.mean, 0, mcmc.trace('category')[:])
fig, axes = plt.subplots(1,1, figsize=(10,4))
axes.scatter(X1, Y, c=p_cat, alpha=1, cmap='coolwarm')

axes.set_ylabel('Y'); axes.set_xlabel('X1');

This gets the correct results, so now I'm confused about what's different between these two models. I get an error trying to use the np.choose function in pymc3, so it could be in looking up the coefficient values.

Best Answer

An alternative is to use the marginalized mixture model (see also this SO answer). This utilizes the NUTS using ADVI and converges within 6000 samples.

import theano.tensore as tt
ncls = 2
with pm.Model() as basic_model:
    w = pm.Dirichlet('w', np.ones(ncls))
    alpha = pm.Normal('alpha', mu=0, sd=10)
    beta = pm.Normal('beta', mu=0, sd=100, shape=ncls)
    sigma  = pm.Uniform('sigma', 0, 20)

    mu = tt.stack([alpha + beta[0]*X1,
                   alpha + beta[1]*X1], axis=1)

    y_obs = pm.NormalMixture('y_obs', w, mu, tau=sigma, observed=Y)

with basic_model:
    trace = pm.sample(5000, n_init=10000, tune=1000)[1000:]

Related Solutions

Solved – Bayesian model selection in PyMC3

You can compute the likelihood of a model indeed using model.logp(). As input, it requires a point. For example, the BEST model from the examples directory I can do:

np.exp(model.logp({'group1_mean': 0.1, 
                   'group2_mean': 0.2, 
                   'group1_std_interval': 1., 
                   'group2_std_interval': 1.2, 
                   'nu_minus_one_log': 1}))

Note that this model is using transformed variables, so I have to supply these. You could then take the exp() of that and use it inside a numerical integrator, for example as provided by scipy.integrate. The problem is that even with only 5 parameters, this will be very slow.

Bayes Factors are generally very difficult to compute because you have to integrate over the complete parameter space. There are some ideas to using MCMC samples for that. See this post, and especially the comment section for more information: https://radfordneal.wordpress.com/2008/08/17/the-harmonic-mean-of-the-likelihood-worst-monte-carlo-method-ever/ The case for BIC is unfortunately similar.

If you really want to compute the Bayes Factor, you can also look at the Savage Dickey Ratio test (see e.g. http://drsmorey.org/bibtex/upload/Wagenmakers:etal:2010.pdf), but it's application is limited.

I suppose that you're trying to do model comparison which is a field with many opinions and solutions (some hard to implement, like BFs). One measure that is very easy compute is the Deviance Information Criterion. It has its downsides, although some of them can be remedied (see http://onlinelibrary.wiley.com/doi/10.1111/rssb.12062/abstract). Unfortunately we haven't ported the code pymc3 yet, but it'd be pretty easy (see here for the pymc2 implementation: https://github.com/pymc-devs/pymc/blob/895c24f62b9f5d786bce7ac4fe88edb4ad220364/pymc/MCMC.py#L410).

Kruschke favors the approach to just build the full model and let it tell you which parameters matter. You could also build variable selection into the model itself (see e.g. http://arxiv.org/pdf/math/0505633.pdf).

Finally, for a much more complete treatment, see this recent blog post: http://jakevdp.github.io/blog/2015/08/07/frequentism-and-bayesianism-5-model-selection/

Bayesian – What Is pm.Potential in PyMC3?

We use pm.Potential here primarily to get around the definition of a likelihood. We ordinarily use it to constrain our likelihood in the manner described in the PyMC docs, but in this example we never end up defining a true likelihood (which would require the inclusion of observations). As such, all the samples that we draw are based on how we defined the potential.

Our price_estimate and true_price are related to each other in the potential by essentially making our true_price the observed values. When we say:

logp = pm.Normal.dist(mu=price_estimate, sd=(3e3)).logp(true_price)

We are evaluating a normal distribution with mean of price_estimate, standard devation of 3e3, at the values provided by true_price (our mock observations). This simulates a likelihood that we can then sample from to get our posteriors. As for the validity of 3e3 as a the standard deviation, I think it is reasonable, given that it is the larger of the standard deviations that we used to define the components of our price_estimate here:

data_std = [5e2, 3e3]

I kept "error" as the name of the variable because that's how Cam named the function when he used the pm.potential decorator in the PyMC version of this chapter.

Please let me know if this is unclear!

Best Answer

Related Solutions

Solved – Bayesian model selection in PyMC3

Bayesian – What Is pm.Potential in PyMC3?

Related Question