Probability that a sample belongs to one of two populations to use with Bayes theorem

bayes-theoremprobabilityprobability distributions

Suppose we have two equally sized samples from two different groups, e.g. healthy and sick people. We measured some continuous variable on them, and we assume the underlying populations are normally distributed. We can estimate the populations of healthy and sick people by calculating the means and stds, and make two PDF functions.

Then suppose another single person got measured, but we don't know if he/she is healthy or sick, and want to assign them to one of the groups. I plot it like this:

Two population PDFs and an unknown sample

I also know the marginal probabilities of the two groups occuring in the world, lets say people of the group 2 form only 1% of total population.

Lets say:

  • Event A is that the sample belongs to one of the groups.
  • Event B is that I measured the value belonging to the unknown sample.
  • A can be "divided" into disjunct events C1 (sample belongs to group 1) and C2 (sample belongs to group 2).

Then from what I understand, I formed this using Bayes theorem:
Usage of Bayes rule

Main question:

Where do I get these last two conditional probabilities numerically?

  • I know I can't read probability from a PDF function.
  • I am not interested in "greater than" or "lesser than" probabilites from CDF functions.
  • I really don't want to mess with "ring like surroundings" and "infinitely small distances" (unless I have to 🙂 ).

My ideas for solution:

  • One idea was to take the probability densities from the PDFs at the unknown value, and make a "ratio" out of it, like odds, that the sample falls in one group over the other. But I am not sure if I can compare probability densities like that.

Probability densities at the unknown sample's value
Odds that the sample falls in one or the other group?

(f1 and f2 are the probability densities group 1 and 2, p1 and p2 could be the odds that the sample falls in one group or the other? That is the question.)

  • If I substitute the conditional probabilities with the probability densities from PDFs, it obviously won't give me a probability value in the end, but could I use the results just to compare them in between, to assign the sample to the higher one?
  • Can I understand the Bayes rule usage in this case such as that the marginal probabilities would scale down the PDF functions by factor or 0.01 and 0.99?

Thanks in advance for clearing this up for me.

Best Answer

The density you are looking for is a mixture of the original normal densities. And the mixing coefficients are the marginal probabilities of each group, 0.01 and 0.99

Edit: Also, $P(B|C_i)$ are the normal densities of each group that you already found/determined.

In my opinion, the language used in this question is a little bit inconvenient, but can be easily fixed. The question asks for a density, so we are looking for a function $f$ such that $P(X\in B)= \int_B f(x)dx$, where $X$ is the random measurement.

It's clear that $P(X\in B) = P(X\in B|C_1)P(C_1) + P(X\in B|C_2)P(C_2)$, and the problem states (in prose) that $X|C_i$ follows a normal distribution with known parameters. So $P(X\in B|C_i) = \int_B \phi(x;\mu_{C_i},\sigma_{C_i}^2)dx$, where $\phi$ is the density of a normal distribution.

In conclusion, $P(X\in B) = P(C_1)\int_B \phi(x;\mu_{C_1},\sigma_{C_1}^2)dx + P(C_2)\int_B \phi(x;\mu_{C_2},\sigma_{C_2}^2)dx = \int_B P(C_1) \phi(x;\mu_{C_1},\sigma_{C_1}^2) + P(C_2)\phi(x;\mu_{C_2},\sigma_{C_2}^2) dx$

which proves that the density we are looking for is exactly $P(C_1) \phi(x;\mu_{C_1},\sigma_{C_1}^2) + P(C_2)\phi(x;\mu_{C_2},\sigma_{C_2}^2)$.

Related Question