I'm going through the Price Is Right example in chapter 5 of Probabilistic Programming & Bayesian Methods for Hackers.
It reads:
Example: Optimizing for the Showcase on The Price is Right
Bless you if you are ever chosen as a contestant on the Price is
Right, for here we will show you how to optimize your final price on
the Showcase. For those who forget the rules:
- Two contestants compete in The Showcase.
- Each contestant is shown a unique suite of prizes.
- After the viewing, the contestants are asked to bid on the price for their unique suite of prizes.
- If a bid price is over the actual price, the bid's owner is disqualified from winning.
- If a bid price is under the true price by less than \$250, the winner is awarded both prizes.
The difficulty in the game is balancing your uncertainty in the
prices, keeping your bid low enough so as to not bid over, and trying
to bid close to the price.Suppose we have recorded the Showcases from previous The Price is
Right episodes and have prior beliefs about what distribution the true
price follows. For simplicity, suppose it follows a Normal:$$\text{True Price} \sim \text{Normal}(\mu_p, \sigma_p )$$
In a later chapter, we will actually use real Price is Right Showcase
data to form the historical prior, but this requires some advanced
PyMC3 use so we will not use it here. For now, we will assume $\mu_p =
> 35 000$ and $\sigma_p = 7500$.We need a model of how we should be playing the Showcase. For each
prize in the prize suite, we have an idea of what it might cost, but
this guess could differ significantly from the true price. (Couple
this with increased pressure being onstage and you can see why some
bids are so wildly off). Let's suppose your beliefs about the prices
of prizes also follow Normal distributions:$$ \text{Prize}_i \sim \text{Normal}(\mu_i, \sigma_i ),\;\; i=1,2$$
This is really why Bayesian analysis is great: we can specify what we
think a fair price is through the $\mu_i$ parameter, and express
uncertainty of our guess in the $\sigma_i$ parameter.We'll assume two prizes per suite for brevity, but this can be
extended to any number. The true price of the prize suite is then
given by $\text{Prize}_1 + \text{Prize}_2 + \epsilon$, where
$\epsilon$ is some error term.We are interested in the updated $\text{True Price}$ given we have
observed both prizes and have belief distributions about them. We can
perform this using PyMC3.Lets make some values concrete. Suppose there are two prizes in the
observed prize suite:
- A trip to wonderful Toronto, Canada!
- A lovely new snowblower!
We have some guesses about the true prices of these objects, but we
are also pretty uncertain about them. I can express this uncertainty
through the parameters of the Normals:$$\begin{align}\text{snowblower} \sim \text{Normal}(3 000, 500 )\\\\\text{Toronto} \sim \text{Normal}(12 000, 3000 )\\\\\end{align}$$
For example, I believe that the true price of the trip to Toronto is
12 000 dollars, and that there is a 68.2% chance the price falls 1
standard deviation away from this, i.e. my confidence is that there is
a 68.2% chance the trip is in [9 000, 15 000].
The code that was provided is the following:
import pymc3 as pm
data_mu = [3e3, 12e3]
data_std = [5e2, 3e3]
mu_prior = 35e3
std_prior = 75e2
with pm.Model() as model:
true_price = pm.Normal("true_price", mu=mu_prior, sd=std_prior)
prize_1 = pm.Normal("first_prize", mu=data_mu[0], sd=data_std[0])
prize_2 = pm.Normal("second_prize", mu=data_mu[1], sd=data_std[1])
price_estimate = prize_1 + prize_2
logp = pm.Normal.dist(mu=price_estimate, sd=(3e3)).logp(true_price)
error = pm.Potential("error", logp)
trace = pm.sample(50000, step=pm.Metropolis())
burned_trace = trace[10000:]
price_trace = burned_trace["true_price"]
I don't understand:
- How does the
true_price
fit in withprice_estimate
? - Where did
sd=(3e3)
come from? - What is a
pm.Potential
object?
Any help would greatly be appreciated. Thanks!
Best Answer
We use
pm.Potential
here primarily to get around the definition of a likelihood. We ordinarily use it to constrain our likelihood in the manner described in the PyMC docs, but in this example we never end up defining a true likelihood (which would require the inclusion of observations). As such, all the samples that we draw are based on how we defined thepotential
.Our
price_estimate
andtrue_price
are related to each other in thepotential
by essentially making ourtrue_price
the observed values. When we say:logp = pm.Normal.dist(mu=price_estimate, sd=(3e3)).logp(true_price)
We are evaluating a normal distribution with mean of
price_estimate
, standard devation of3e3
, at the values provided bytrue_price
(our mock observations). This simulates a likelihood that we can then sample from to get our posteriors. As for the validity of3e3
as a the standard deviation, I think it is reasonable, given that it is the larger of the standard deviations that we used to define the components of ourprice_estimate
here:data_std = [5e2, 3e3]
I kept "error" as the name of the variable because that's how Cam named the function when he used the
pm.potential
decorator in the PyMC version of this chapter.Please let me know if this is unclear!