[Math] Why would I use Bayes’ Theorem if I can directly compute the posterior probability

bayesianprobability

I fully understand the mechanics of Bayes' Theorem. However, I am wondering when do I need to use it? If I am able to compute the posterior probability directly from measured data, why would I need to use Bayes' Theorem?

For example, consider NBA basketball games. Let $A = team\ wins$ and let $B = team\ scores\ 100\ points$. I want to compute the posterior probability $P(A|B)$, or $P(team\ wins | team\ scores\ 100\ points)$. If I expand this out using Bayes, I would get:

$$P(team\ wins | team\ scores\ 100) = \frac{P(team\ scores\ 100 | team\ wins) \cdot P(team\ wins)}{P(team\ scores\ 100)}$$

Suppose I have the entire log of my team's results. I can compute the posterior (left-hand-side) directly from the logs by simply building a contingency table and doing the appropriate calculations, just as I would to compute the likelihood probability $P(B|A)$. Why would I need to compute the posterior through Bayes' Theorem? Would I use it when I have sparse data (e.g. A and B seldom co-occur)?

Best Answer

You are absolutely right in that you don't need to go through Bayes' formula to calculate the relative frequencies -and this is the critical point:if you do as you suggest, you are NOT calculating probabilities, and they are not even "posterior" - they are just "conditional" (not the same thing).

Your question is equivalent to ask "Is descriptive statistics the same thing as inferential statistics?" I know you didn't use the term "statistics" in your question, but you cannot escape it.

Naturally, since you have the data, you can easily compute the relative frequencies, be it joint, marginal, or conditional, of the events that have occurred. What does that tell you? That for events passed, the relative frequencies of the events you are interested in, were so and so.These are not probabilities yet. In order for them to be treated as probabilities, you have to make additional assumptions.

Why? because probabilities are used to describe (and hopefully manage) uncertainty - and uncertainty relates to the unknown (usually the future, but not necessarily - it may refer to events that have happened but for which you don't know the outcome). So in order to move from the known and certain (the empirical frequencies you have calculated -which is what descriptive statistics is all about) to the unknown (the probabilities) it is obvious that you have to make additional assumptions to somehow use relative frequencies in place of probabilities- they are not automatically equivalent.

And here is where "frequentists" and "Bayesians" part ways. Tailored to your question (and oversimplifying of course),

The frequentist would make the following assumption:"I assume that my sample (the data from games played), is representative of what happens "in general" with this team. So, next season will be approximately the same. Given this assumption I can use the relative frequencies obtained from this sample as approximate estimates of the probabilities of what will happen next season." Then go on and calculate $P(team\ wins | team\ scores\ 100)$ directly from the contingency tables.

The Bayesian will object as follows: "your assumption that your sample is "representative" is unfounded. Either you have available other samples also, in which case bring them forth and prove that your current sample is representative, or you don't have other samples, so you cannot proceed as you said - your inference is unreliable". The Bayesian then would say "if we don't have other samples, the best we can do, is to accept our ignorance, then start somewhere -the prior (="before the data") and let the data modify our possibly ad hoc starting point -lead us to the posterior (="after the data")".
This means that in the Bayes' formula that appears in your question, the magnitude $P(team\ wins)$ is NOT calculated from the sample you have, but it is assigned a value a priori as the prior . And then you go to the sample to calculate $P(team\ scores\ 100 | team\ wins)$ and $P(team\ scores\ 100)$, and now you see why you have to go through Bayes formula to arrive at something that can legitimately be called the "posterior" probability $P(team\ wins | team\ scores\ 100)$: it is not calculated only from the sample at hand -if you want to use your calculations in order to say something about games that have not yet been played.

With either approach, you are now in the realm of inferential statistics.

Note: The fact that we do all these through statistical distributions and not as point probabilities (as has already been mentioned), is because we want a fuller picture of the structure of uncertainty that surrounds the future outcomes, but also, in order to calculate the uncertainty/error in our estimated probabilities.

Related Question