Solved – How to calculate conditional & marginal probability for both the positive and negative hypotheses

bayesianconditional probability

Sorry to keep bothering you guys with this thing, but another stupid question: Given the following data (from my previous question), how would one calculate the conditional probability and marginal probability (for Bayesian inference) of both the positive hypothesis and the negative hypothesis for, say, choice #3?

Suppose I have a questionnaire and I ask respondents how often they eat at McDonalds:

  1. Never
  2. Less than once a month
  3. At least once a month but less than once a week
  4. 1-3 times a week
  5. More than 3 times a week

I then correlate these answers with whether the respondents are wearing brown shoes.

  1. Brown 65 — not brown 38
  2. Brown 32 — not brown 62
  3. Brown 17 — not brown 53
  4. Brown 10 — not brown 48
  5. Brown 9 — not brown 6

Effectively, "brown" is the hypothesis, the "brown" counts are "true positives" and the "not-brown" counts are "false positives".

When I look at this straight-on it seems relatively simple — Brown total = 133, not brown total = 207, overall total 340 (if I did my math right)[which I didn't the first time]. So the conditional probability of brown for #3 is 17/133, and the marginal probability is (17+53)/340.

For the negative hypothesis it would seem that you can simply turn the statistics on their head and treat "not brown" as the hypothesis, so the "not brown" counts are "true positives" and the "brown" counts are "false positives". Then the conditional probability of "not brown" (the negative hypothesis) for #3 is 53/207, and the marginal probability of "not brown" is still (17+53)/340.

The thing that confuses me is if, due to a previous questionnaire response that is not independent of this one, the marginal probability for #3 must be increased by some amount. One would assume that this would reenforce the true hypothesis and weaken the false hypothesis (or vice-versa), but if the same marginal probability is used for both cases then both hypotheses are affected in the exact same fashion.

Once again, this makes my head hurt.

Am I calculating the probabilities wrong? Am I wrong in believing that both hypotheses shouldn't be affected in the same direction by a change in marginal probability due to interdependence between this question and a prior one?

Thanks, Greg —

After thinking about it I realize that my mistake was in assuming that (aside from the obvious effect on prior probability) an effect on the marginal probability was the ONLY effect that the prior application of an inter-dependent observation had on the "presumed independent" statistics of a subsequent observation. It does affect the marginal probability, but, more importantly, moves the conceptual threshold between alternative observations. Eg, the dependent "more than 3 times a week" category might come to correspond to a "more than 2 times a week" grouping of the independent responses.

This isn't as clean an answer as I was looking for, but it does help me understand how to most appropriately perform ad-hoc "artistic" compensation for inter-dependent observations.

Best Answer

When I learned about Bayes' rule, the metaphor I was given would have me call shoe-color a "test result" and McD's consumption the "underlying condition".

The joint probabilities we would lay out like this:

           Brown        Other          Marginal
1            a           b             (a+b)/S = P(1)
2            c           d             (c+d)/S = P(2)
3            e           f             etc.
4            g           h             etc.
5            i           j             etc.

          P(Brw)=       P(Othr)=
Marg  (a+c+e+g+i)/S  (b+d+f+h+j)/S      S = grand total count

This way it's kind of easy to see what the joint and conditional probabilities are.

P(Brown & 1) = a/S
P(Brown | 1) = P(B&1)/P(1) = a/S / P(1) = a/(a+b)
P(1 | Brown) = P(B&1)/P(B) = a/S / P(B) = a/(a+c+e+g+i)

NB: S in your case is 340, not 240.

I'm having trouble answering your question though because it's not clear to me where this extra non-independent questionnaire comes in. I don't think it's quite right to think of this other data as 'changing the marginal probability of 3'. Instead, maybe you want to formulate the question in terms like:

First: what is the probability of Brown given 3 on second questionnaire (marginalizing over all responses to the previous questionnaire)?
Second: what is the probability of Brown given 3 on the second questionnaire and 3 on the second questionnaire?

To get at those you would need to give your table another dimension. eg

  (:,:, Q1=1)         (:,:,:Q1=2)        ...   (:,:,Q1=5)
Q2  Brown Other        Brown Other              Brown    Other
1    a      b          g      h                   m        n
2    c      d          i      j                   o        p
...                ...                             ...
5    e      f          k      l                   q        r

I think if you get the marginals right and pick some actual numbers to run your example, and make the comparisons more formal, it may make more sense. Hope this is somewhat helpful.