[Math] Bayes Theorem Breast Cancer

bayes-theoremprobability

This is a famous introductory problem to Bayes theorem, but I'm having trouble with my answer and I need someone to tell me what I am doing wrong. The problem goes:

$10$ out of $1000$ women aged $40$ and over participating in routine screenings have breast cancer. $8$ out of $10$ women with breast cancer will get a positive mammography. $95$ out of $990$ women without breast cancer will get a positive mammography.

  • a) What is the probability of a false positive?
  • b) What is the probability of a false negative?

Now usually when approached with a Baye's theory problem, I make a tree. Like:

  • Breast Cancer $(\frac{10}{1000})$ -> Positive $(\frac{8}{10})$ / Negative $(\frac{2}{10})$

  • Not Breast Cancer $(\frac{990}{1000})$ -> Positive $(\frac{95}{990})$ / Negative $(\frac{895}{990})$

(Sorry ahead of time for the formatting. I wasn't sure how to create a tree diagram here, but I think you get the point. Leave me a comment if anything is unclear)

So starting with a) (and for right now since b) is pretty close to a) we can essentially focus on that) the probability we are looking for is $P(CancerFree \mid PositiveTest)$. Although we can shortcut it by doing 1 – $\frac{895}{990}$ I believe to get the answer, for the sake of my understanding I want to do it the long way. Thus, we implement Baye's Rule:
$$ = \frac{P(PositiveTest \mid CancerFree) * P(CancerFree)}{P(PositiveTest)}$$
$$ = \frac{(\frac{2}{10})(\frac{990}{1000})}{(\frac{990}{1000})(\frac{95}{990})+(\frac{10}{1000})(\frac{8}{10})}$$

However, mathematically this doesn't come out right. Would someone be able to help me understand? Thanks so much in advance for your help, I really appreciate it!

Best Answer

I'm not so much a fan of probability trees. I like tables. In any case, always define your events precisely. Here I will use $$T = \text{A randomly selected person tests positive,} \\ C = \text{A randomly selected person has breast cancer.}$$ Furthermore, $\bar T$ and $\bar C$ represent the complementary events of testing negative and not having cancer, respectively. Then for the following table structure $$\begin{array}{c|c|c|c} & C & \bar C & \\ \hline T & n_{11} = T \cap C & n_{12} = T \cap \bar C & n_{1*} = n_{11} + n_{12} \\ \hline \bar T & n_{21} = \bar T \cap C & n_{22} = \bar T \cap \bar C & n_{2*} = n_{21} + n_{22} \\ \hline & n_{*1} = n_{11} + n_{21} & n_{*2} = n_{12} + n_{22} & n_{**}\end{array}$$
we are given the following frequencies: $$\begin{array}{c|c|c|c} & C & \bar C & \\ \hline T & 8 & 95 & ?\\ \hline \bar T & ? & ? & ? \\ \hline & 10 & 990 & 1000\end{array}$$ Filling in the blanks is trivial: $$\begin{array}{c|c|c|c} & C & \bar C & \\ \hline T & 8 & 95 & 103\\ \hline \bar T & 2 & 895 & 897 \\ \hline & 10 & 990 & 1000\end{array}$$ Now the probability of a false positive is simply $$\Pr[T \mid \bar C] = \frac{\Pr[T \cap \bar C]}{\Pr[\bar C]} = \frac{n_{12}}{n_{*2}} = \frac{95}{990}.$$ Indeed, this could be immediately stated from the question; no computation was needed. The probability of a false negative is $$\Pr[\bar T \mid C] = \frac{\Pr[\bar T \cap C]}{\Pr[C]} = \frac{n_{21}}{n_{*1}} = \frac{2}{10},$$ only slightly less trivial than the previous question. So we can see there's no real need to rigorously do any Bayesian calculations--the questions ask for information that is readily found from the given conditions.

Be advised that the definition of false positive and false negative are as follows:

false positive: A test result is positive when the true condition is negative.

false negative: A test result is negative when the true condition is positive.

Do not confuse this with false discovery and false omission: false positive and false negative have to do with the chance of the test giving a result that is contradictory to the true condition, rather than the condition being actually present (absent) when the test is positive (negative).

If the question had asked, "what is the positive predictive value (PPV) of the test; i.e., what is the chance that someone who tests positive actually has breast cancer," then this would be $$\Pr[C \mid T] = \frac{\Pr[T \cap C]}{\Pr[T]} = \frac{8}{103}.$$ This is a horrible rate; it means that less than $8\%$ of women with a positive mammogram actually have breast cancer; thus mammography should not be used as a diagnostic tool.

What is the negative predictive value (NPV) of the test; i.e., if a woman tests negative, what is the probability she in fact does not have breast cancer? This is $$\Pr[\bar C \mid \bar T] = \frac{\Pr[\bar T \cap \bar C]}{\Pr[\bar T]} = \frac{895}{897},$$ which is very high: it means a woman testing negative can be fairly assured that she does not have breast cancer. So, mammography's utility, according to the problem, seems to be in providing reassurance for those who test negative.

Related Question