Why do many textbooks on Bayes’ Theorem include the frequency of the disease in examples on the reliability of medical tests

bayes-theorembayesianconditional probabilityprobabilitystatistics

A "standard" example of Bayes Theorem goes something like the following:

In any given year, 1% of the population will get disease X. A particular test will detect the disease in 90% of individuals who have the disease but has a 5% false positive rate. If you have a family history of X, your chances of getting the disease are 10% higher than they would have been otherwise.

Virtually all explanations I've seen of Bayes' Theorem will include all of those facts in their formulation of the probability. It makes perfect sense to me to account for patient-specific factors like family history, and it also makes perfect sense to me to include information on the overall reliability of the test. I'm struggling to understand the relevance of the fact that 1% of the population will get disease X, though. In particular, that fact is presumably true for all patients who receive the test; that being the case, wouldn't Bayes' Theorem imply that the actual probability of a false positive is much higher than 5% (and that one of the numbers is therefore wrong)?

Alternatively, why doesn't the 5% figure already account for that fact? Given that the 5% figure was presumably calculated directly from the data, wouldn't Bayes' Theorem effectively be contradicting the data in this case?

Best Answer

I believe it's commonly included because it's counterintuitive. You would expect a test with a high degree of accuracy to be right most of the time but this isn't actually the case and requires more evidence. To address this I think of it as the "error of one sample" fallacy which is to say you can't do an experiment one time and make strong conclusions, even if the experiment is well-designed.

Related Question