Suppose there a medical test is administered to test if a person has a particular disease:
- If they have the disease, there is 10% probability that the test says they don't have the disease. This is called false negatives.
- If they don't have the disease, there is 30% probability that the test says they have the disease. This is called false positives.
Suppose that a random patient is given this test. If the test result is positive, what is the probability they have the disease?
Logically, it goes like this
Pr(Positive,Positive) = 100% – False Positive = 100% – 30% = 70%.
Suppose that now it is known that the disease only occurs in 10% of the population. Using posterior probability, the probability is
Pr(Positive,Positive) = 25%
Why does knowing that the disease only occurs in 10% of population change the probability that the patient has the disease? I'm confused; can someone please help me clear my confusion?
Best Answer
The way to frame and interpret medical tests in general is to understand them as updating one's level of certainty that the patient has the disease:
Given a positive test result, the (updated) probability $P(D|+)$ that the patient is indeed diseased can be derived from the following probability tree:
p: disease prevalance and other (prior) risk factors
v: test sensitivity
f: test specificity
D: Diseased
H: Healthy
\begin{aligned}P(D|+)&=\frac{P(D+)}{P(D+)+P(H+)}\\&=\frac{pv}{pv+(1-p)(1-f)}.\end{aligned} This formula makes clear that $P(D|+)$ is a function of disease prevalence $p,$ test sensitivity $v,$ and test specificity $f.$
It makes sense that information about the test's technical characteristics ($v$ and $f$), as well as the disease prevalence and the patient's prior health ($d$), should refine our knowledge of the probability that the patient has the disease.
Addendum
OP: Does fewer people having the disease increase the probability that the test result is just a false positive?
Yes. From the above probability tree, the probability that the test result is a false positive is $$P(H+)=(1-p)(1-f).$$ So, the lower the disease prevalence $p,$ the greater this probability; in fact, unless the test has 100% specificity, the number of false-positive results is directly proportional to $(1-p).$