I'm using PyMC to sample the posterior distribution and I've run into a roadblock with using priors from samples, not models. My situation is as follows:
- I have some empirical data for a parameter $z$ from which I compute a probability distribution $p(z)$. There is no known model/parametrization for the distribution of $z$. All I have are empirical values. It is known that $z$ is bounded between 0 and 30.
- I have some new observations $x$ and I compute the likelihood $p(x|z)$, also empirically.
- I wish to find the posterior distribution updated with the new data above. (This posterior becomes the new prior for the next set of observations. Rinse and repeat.)
For example, consider this:
import numpy as np
import matplotlib.pyplot as plt
# Ignore the fact that I'm using a mixture model. For all practical
# purposes, I do not know how this is generated.
old_data = np.array([3 * np.random.randn(1000) + 20,
3 * np.random.randn(1000) + 5,
4 * np.random.randn(1000) + 10])
new_data = np.array([4 * np.random.randn(50) + 8,
2 * np.random.randn(50) + 17])
plt.hist(filter(lambda x: 0 <= x <= 30, old_data.flatten()),
bins=range(0, 30), normed=True, alpha=0.5)
plt.hist(filter(lambda x: 0 <= x <= 30, new_data.flatten()),
bins=range(0, 30), normed=True, alpha=0.5)
plt.legend(['p(z)', 'p(x|z)'])
Coming to PyMC, all existing sources I've found (for instance, this chapter from Bayesian for hackers) uses a normal distribution for the observations (observations = pm.Normal("obs", center_i, tau_i, value=data, observed=True)
) and uniform and normal prior distributions for the precisions and means respectively.
I'm not sure how to assert in PyMC that my prior is this distribution here and not a model. I've also tried using the @Stochastic
decorator with observed=True
, but I don't think I fully understand it. Besides, I still can't seem to figure out a way to avoid specifying a model.
Am I fundamentally misunderstanding the purpose of an MCMC library? If so, how should I proceed?
All I really want is to update my prior belief with the new observations, but I don't think the solution is as simple as multiplying the two histograms (and normalizing).
Best Answer
If you already have a prior $p(\theta)$ and a likelihood $p(x|\theta)$, then you can easily find the posterior $p(\theta|x)$ by multiplying these and normalizing: $$p(\theta|x)=\frac{p(\theta)p(x|\theta)}{p(x)}\propto p(\theta)p(x|\theta)$$
https://en.wikipedia.org/wiki/Posterior_probability
The following code demonstrates estimating a posterior represented as a histogram, so it can be used as the next prior:
If, however, you want to combine your empirical prior with some MCMC models, I suggest you take a look at PyMC's Potential, one of its main applications is "soft data". Please update your question if you need an answer targeted towards that.