[Math] Bayes, two tests in a row

bayesianprobability

I came up with a standard Bayesian example as to point out my confusion.

There is an epidemic. A person has a probability $\frac{1}{100}$ to have the disease. The authorities decide to test the population, but the test is not completely reliable: the test generally gives $\frac{1}{110}$ people a positive result but given that you have the disease the probability of getting a positive result is $\frac{80}{100}$.

I am interested in what happens after a person takes another test, specifically how much more information we would gain.

Probability after one test

Let $D$ denote the event of having the disease, let $T$ denote event of a positive outcome of a test. If we are interested in finding $P(D|T)$ then we can just go and apply Bayes rule:

$$ P(D|T) = \frac{P(T|D)P(D)}{P(T)} = \frac{0.8 \times 0.01}{0.009} = 0.88 $$

This feels about right.

Probability after two tests

This is where I think I misunderstand Bayes rule somewhat. Let $TT$ denote the outcome of two positive tests. We are now interested in calculating;

$$ P(D|TT) = \frac{P(TT|D)P(D)}{P(TT)} $$

The prior $P(D)$ is still $\frac{1}{100}$. $P(TT|D)$ would now be $0.8 \times 0.8$ because the two test can be assumed to be independent.

But I seem to not know how to deal with $P(TT)$ … it cannot be $\frac{1}{110} \times \frac{1}{110}$ because then;

$$ \frac{P(TT|D)P(D)}{P(TT)} = \frac{0.64 \times 0.01}{0.009^2} > 1 $$

What is the right approach to the two-test Bayesian case?

Best Answer

As an aside, I believe the proper value for $P(D|T)$ is exactly $.88 = \frac{8}{10}\frac{1}{100}\frac{110}{1}$

We have $P(T)$, the probability of the test showing a positive regardless of disease state as $\frac{1}{110}$. This has to be the conditional probability of a positive given diseased plus the conditional probability of a positive given disease-free. In other words: $$ \begin{align} P(T) &= P(T\cap D) + P(T\cap \neg D)\\ &= P(T|D)P(D) + P(T|\neg D)P(\neg D)\\ \frac{1}{110} &=\frac{8}{10}\frac{1}{100} + P(T|\neg D)\frac{99}{100}\\ P(T|\neg D) &=\frac{2}{1815} \end{align} $$

Next: $$ \begin{align} P(TT) &= P(TT|D)P(D) + P(TT|\neg D)P(\neg D)\\ &= \frac{64}{100}\frac{1}{100} + \frac{4}{3294225}\frac{99}{100}\\ &=\frac{21087}{3294225} = \frac{213}{33275} \approx 0.006401202 \end{align} $$ Now $$ \begin{align} P(D|TT) &= \frac{P(TT|D)P(D)}{P(TT)}\\ &= \frac{64}{100}\frac{1}{100}\frac{33275}{213}\\ &= \frac{5324}{5325} \approx 0.999812207 \end{align} $$

So, after two tests, we are really sure this person is diseased.

Update

In general, though, with Bayesian estimation, one can use the previous posterior as the current prior-- see slides 3 and 4. This will follow through as well here. Let $P(D^*)$ be the new prior (after one test). Now we live back in one test world, as one test after one test is the same as two tests after no tests. So $P(D^*)$ is $0.88$ from above. $P(T|D^*)$ remains the same as does $P(T|\neg D^*)$. So, all we need is: $$ \begin{align} P(TT) &= P(T|D^*)P(D^*) + P(T|\neg D^*)P(\neg D^*)\\ &= 0.8\cdot.88 + \frac{2}{1815}\cdot0.12\\ &= \frac{426}{605} \approx 0.704132231 \end{align} $$

Note that $P(TT)$ in the $D^*$ world is much greater than $P(TT)$ in the $D$ world. It stands to reason since $TT$ in $D^*$ is actually $T$ (one test) after already knowing a positive test. $TT$ in $D$ is a priori two tests knowing nothing. Now, as per before: $$ \begin{align} P(D|TT) &= \frac{P(TT|D)P(D)}{P(TT)}\\ &=\frac{8}{10}\frac{88}{100}\frac{605}{426}\\ &=\frac{5324}{5325} \approx 0.999812207 \end{align} $$

Related Question