[Math] Conditional probability and testing twice

bayes-theorembayesianstatistics

I am struggling with an interview quiz question which starts with a standard conditional probability part:

To detect genetic defects, you are in charge of
performing a test. You know that:

  • 1% of people have a genetic defect

  • 99.9% of tests for the gene detect the defect (true positives)

  • 5% of the tests give a positive result even though there is no defect (false positives)

Given that the condition of a patient is known, the results of multiple tests are independent

a)
If a person gets a positive test result, what is the probability that he/she actually has the genetic defect?

For a), I would argue that this is standard conditional probability and can be solved with Bayes Rule. Let's call the event "$+$" when a person tests positive, "$d$" when a person has the disease and "$\bar{d}$" when a person does not have the disease.

Now we are looking for $P(d\mid+)$, so the conditional probability that someone actually has the disease when he tests positive. Given Bayes, that is

\begin{align}
P(d\mid+) & = \frac{P(+\mid d)}{P(+)}\cdot P(d) = \frac{0.999}{P(+\mid d)\cdot P(d) + P(+\mid\bar{d})\cdot P(\bar{d})}\cdot 0.01 \\[10pt]
& = \frac{0.999}{0.999\cdot 0.01 + 0.05\cdot 0.99}\cdot 0.01 \approx 0.168.
\end{align}

So the probability is roughly 17%.

What is more complicated for me is b):

b) If a person gets a positive result in his/her first test, what is the probability of having a positive result in his/her second test?

I would argue that we are looking for $P(++|+)$, i.e. the probability that someone tests positive the second time under the condition that he tested positive the first time.

So we can apply Bayes again and get $\frac{P(+|++)}{P(+)}\cdot P(++)$. I'd argue that $P(+|++)$ is always 1 and that $P(++) = P(+)\cdot P(+)$. We can cancel out one $P(+)$ and this leaves us with $P(++|+) = P(+)$.

On the one hand this looks reasonable given the independence proclaimed, on the other hand it feels counter-intuitive that the probability would only be 6% (on the third hand, we are talking about statistics here and that never went well together with common sense for me :-)).

Thoughts?

Best Answer

Denote events $D$, $\bar D$ as the patient having the defect and not having the defect, respectively. Let $P_i$ denote the event that test $i$ is positive, and $\bar P_i$ the event that test $i$ is negative.

Then the desired probability is $$\Pr[P_2 \mid P_1] = \frac{\Pr[P_2 \cap P_1]}{\Pr[P_1]} = \frac{\Pr[P_2 \cap P_1 \mid D]\Pr[D] + \Pr[P_2 \cap P_1 \mid \bar D]\Pr[\bar D]}{\Pr[P_1 \mid D]\Pr[D] + \Pr[P_1 \mid \bar D]\Pr[\bar D]}.$$ Since $P_i$ are conditionally independent given the defect status, we have $$\Pr[P_2 \cap P_1 \mid D] = (\Pr[P_i \mid D])^2.$$ Then, given $$\Pr[D] = 0.01, \quad \Pr[P_i \mid D] = 0.999, \quad \Pr[P_i \mid \bar D] = 0.05,$$ we easily obtain $$\Pr[P_2 \mid P_1] = \frac{(0.999)^2(0.01) + (0.05)^2(1-0.01)}{(0.999)(0.01) + (0.05)(1-0.01)} \approx 0.209363.$$ This number is small because the prevalence of defects is so rare, and the false positive rate is much higher than the prevalence. Therefore, a positive result is more likely to result from a false positive, and a second test is not terribly likely to come back positive.

Related Question