Define a hypothesis of Bayes Theorem in this problem

probability

While reading Think Bayes by Allen Downey, I encountered an exercise problem bellow.

Exercise: M&M's are small candy-coated chocolates that come in a variety of colors.
Mars, Inc., which makes M&M's, changes the mixture of colors from time to time. In 1995, they introduced blue M&M's.

In 1994, the color mix in a bag of plain M&M's was 30% Brown, 20% Yellow, 20% Red, 10% Green, 10% Orange, 10% Tan.

In 1996, it was 24% Blue , 20% Green, 16% Orange, 14% Yellow, 13% Red, 13% Brown.

Suppose a friend of mine has two bags of M&M's, and he tells me that one is from 1994 and one from 1996. He won't tell me which is which, but he gives me one M&M from each bag. One is yellow and one is green. What is the probability that the yellow one came from the 1994 bag?

I though that there were two hypotheses: one was that the yellow M&M was from a bag from 1994 and the other was that the yellow M&M was from a bag from 1996. Although I was not really sure about an evidence, I guessed that the evidence is that one M&M is yellow and the other is green.

However, according to the solution, those hypotheses are defined like below.

Hypotheses

A: yellow from 94, green from 96

B: yellow from 96, green from 94

Then, prior probability for each hypothesis is 1/2. I also do not understand why P(yellow from 94 and green from 96) is 1/2. Should it be 0.2*0.2 = 0.04?

I think that I miss something very significant in my understanding.

My questions:

  1. Why do I need to include the green M&M information in the
    hypotheses?
  2. Why is the prior probability for each hypothesis
    1/2?
  3. What would be the correct evidence I should use in this problem? In my understanding, hypotheses must not include any additional information. But, hypotheses in the solution certainly include additional information that one M&M is yellow and the other is green.

Best Answer

As whuber pointed out, the problem, as stated, is ill-posed. Nonetheless, I am going to answer it precisely because it is. There is an extra lesson here beyond the one posed by the book.

To answer your question, I went online to find the exact wording of the original problem. It was also less than ideal because it referred back to another problem presented in an earlier section for the rules. It appears to what would more commonly be called an urn model where any M&M in the bag is equally likely to be drawn from that bag with the person doing the drawing being blind.

Before answering your questions, I think I should answer the implicit question of why does whuber’s statement matter?

In more advanced applications, the selection mechanism would be called the data generating function, and it drives the form of the likelihood function. The issue of what was the selection rule is essential in real-world applications. What if you are about to gamble money, having watched the person you are betting with stumble and have great problems, only to have a friend whisper that the person you are about to gamble money with is called “Slick Eddy” and is also a stage magician. Bayesian methods care which “small world” you are really living in. Is Eddy a stumbling fool about to lose a lot of money, or is Slick Eddy a world-class con man and magician? If you perform your calculations only from Eddy’s observed behavior, you may unnecessarily lose some money from flawed calculations.

Now let us go back to the question at hand. We have two bags; we will denote them $B_1$ and $B_2$. There are two models of the world; we will call them $\theta_1$ and $\theta_2$. In $\theta_1$, we will assume that $B_1$ was made in 1994 and $B_2$ was made in 1996. Model $\theta_2$ is the complementary model.

The prior distribution was handwaved a bit, and it might be interfering with your understanding. I would love to blame the author, but I am notorious for sloppy wording in everyday life. It was a little sloppy if the other person was a statistician, but maybe very sloppy for a learner.

So what I want to do is add some things that were not actually present in the problem but might help add to the illustration to develop the prior distribution.

Let us assume the bags are identical. We will assume the copyright symbol and the year are nowhere to be found, for example. Furthermore, one bag sits in a box labeled Bag 1 and the other in a box labeled Bag 2. Both bags are sealed, and there is no physical activity that you could perform to identify what year the bags came from.

Knowing this and only this, what is the prior probability that Bag 1 was made in 1994? Choosing a value of 1/2 is very reasonable. Of course, and this is important, you may have other personal information that is not stated in the problem. You may know or at least believe that the author is a Capricorn, and you may believe that all Capricorns would put the 1994 bag in Box 1. Instead, you might be pretty confident, say 90%, that this is true, and that would also impact your prior distribution since you are not sure. Bayesian methods let you incorporate things like that.

If your M&M’s were drawn from the bag and you were forced to place a bet on which one really was from 1994, even odds for either choice are eminently reasonable. In the revised language of the case as I am using it here, our two hypotheses are that Bag 1 is from 94 and Bag 2 is from 96, with the complement being our alternative.

A person who cannot see the inside of the bag will draw one M&M from each. The choice of an M&M in the first bag is not impacted by a choice made in the second bag and vice-versa. They are independent events, so we can multiply their probabilities together.

We will define $$\theta_1=\begin{bmatrix}94\\ 96\end{bmatrix}$$ and $$\theta_2=\begin{bmatrix}96\\ 94\end{bmatrix},$$ where the top of the matrix is the year of production for Bag 1 and the bottom Bag 2.

Since $P(\theta_1)=1-P(\theta_2)$ we need only calculate one of the hypotheses should we care about the probability of another. Bayesian hypotheses could have more than two cases.

So we will use Bayes’ rule. For notation purposes, we will put the data from Bag 1 in the first position and Bag 2 in the second position.

We need to solve $$P(\theta_1|YG)=\frac{P(YG|\theta_1)P(\theta_1)}{ P(YG|\theta_1)P(\theta_1)+ P(YG|\theta_2)P(\theta_2)}.$$

The probability of observing YG if $\theta_1$ is the true model for certain, is $$\frac{1}{5}\times\frac{1}{5}.$$ Nonetheless, we do not know that to be the true state of affairs. We are going with the prior probability of $P(\theta_1)=1/2$.

Of course, that is not the only possible world that we could be living in. We may be living in $\theta_2$. The denominator requires us to recalculate the probabilities of what we saw if we are living in that other world instead.

Our calculations become $$P(\theta_1|YG)=\frac{\frac{1}{5}\frac{1}{5}\frac{1}{2}}{\frac{1}{5}\frac{1}{5}\frac{1}{2}+\frac{7}{50}\frac{1}{10}\frac{1}{2}}=\frac{20}{27}.$$

The Green M&M matters because if you knew the identity of one bag, then you would also know the identity of the other bag.

Recalculate the above with Yellow alone, then Yellow and Tan to see why it really matters. The second M&M will have tremendous importance.

Related Question