Solved – Probability function from two random variables in PyMC3

bayesianinferencepymcpython

I'm new to Bayesian and PyMC3 and I am trying to predict a chance of failure very similar to this example (the part example:Challenger Space Shuttle Disaster):

http://nbviewer.jupyter.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter2_MorePyMC/Ch2_MorePyMC_PyMC3.ipynb

In this example the chance of failure is modeled with respect to the outside temperature.

enter image description here

As a probability function, the logistic function is used.

The pymc3 model is as follows:

import pymc3 as pm

temperature = challenger_data[:, 0]
D = challenger_data[:, 1]  # defect or not?

#notice the`value` here. We explain why below.
with pm.Model() as model:
    beta = pm.Normal("beta", mu=0, tau=0.001, testval=0)
    alpha = pm.Normal("alpha", mu=0, tau=0.001, testval=0)
    p = pm.Deterministic("p", 1.0/(1. + tt.exp(beta*temperature + alpha)))
    observed = pm.Bernoulli("bernoulli_obs", p, observed=D)

    # Mysterious code to be explained in Chapter 3
    start = pm.find_MAP()
    step = pm.Metropolis()
    trace = pm.sample(120000, step=step, start=start)
    burned_trace = trace[100000::2]

Now I want to make the probability function dependent of two stochastic variables, like outside temperature and O-Rings size for instance.

I believe I should draw from two beta and alpha distributions, but I cannot see which probability function I should use.

Best Answer

At the core, this is a logistic regression model. First, there are many improvements in pymc3 3.2 that should make this model more efficient, like using the (default) NUTS sampler:

import pymc3 as pm

temperature = challenger_data[:, 0]
D = challenger_data[:, 1]  # defect or not?

import theano.tensor as tt
with pm.Model() as model:
    beta = pm.Normal("beta", mu=0, tau=0.001)
    alpha = pm.Normal("alpha", mu=0, tau=0.001)
    p = pm.Deterministic("p", tt.nnet.sigmoid(beta*temperature + alpha))
    observed = pm.Bernoulli("bernoulli_obs", p, observed=D)

    trace = pm.sample()

Essentially then you want to introduce more covariates. You do this just like in a linear regression, e.g. if you have orings:

import pymc3 as pm

temperature = challenger_data[:, 0]
D = challenger_data[:, 1]  # defect or not?

import theano.tensor as tt
with pm.Model() as model:
    beta_temp = pm.Normal("beta", mu=0, sd=1)
    beta_orings = pm.Normal("beta", mu=0, sd=1)
    alpha = pm.Normal("alpha", mu=0, sd=1)
    p = pm.Deterministic("p", tt.nnet.sigmoid(beta_temp*temperature + beta_orings*orings + alpha))
    observed = pm.Bernoulli("bernoulli_obs", p, observed=D)

    trace = pm.sample()

Note that I also changed the priors to be more informed, which is always a good idea (but it assumes your covariates are z-scored so might not be a good idea here). There is also more convenient syntax in the glm submodule, see here: http://docs.pymc.io/notebooks/GLM-logistic.html

Finally, you get a better hit-rate for your questions on http://discourse.pymc.io/.

Related Question