Can anyone help with this question ?
In a population, it is estimated HIV prevalence to be $\lambda$.
For a new test for HIV:
- $\theta$ is the probability of an HIV positive person to test positive
- $\eta$ is the probability an HIV negative person tests positive in this test.
A person takes the test to check whether they have HIV, he tests positive.
What is the predictive probability he tests negative on the second test?
Assumption: Repeat tests on the same person are conditionally independent.
From my notes predictive probability is given as:
$P(\tilde{Y} = \tilde{y} | Y = y) = \int p(\tilde{y}|\tau) p(\theta|\tau)$
here $\tilde{Y}$ is the unknown observable, $y$ is the observed data and $\eta$ the unknown.
I am interested in the probability of the second test is negative, given that the first test is positive,without knowing if the man really has HIV or not.
To facilitate this I define:
- $y_1$ as the event of the first test being positive and
- $\tilde{y_{2}}$ as the second test being negative
Would this adaption to the formula given above be the correct/best approach to this problem ?
$p(\tilde{y_{2}}, y_{1}|\tau) = p(\tilde{y_{2}}|\tau) p(y_{1}|\tau)p(\tau) $
and this is really
$\propto p(\tilde{y_{2}}|\tau) p(\tau|y_{1})$
I've gotten for the $p(\tau|y_{1})$ from Bayes' theorem:
$$p(\tau|y_{1}) = \frac{p(\tau)p(y_1|\tau)}{p(y_1)} \\
= \frac{\lambda \theta}{ \lambda \theta + \eta (1 – \lambda) }$$
How could I then find $p(\tilde{y_{2}}|\tau)$ ? Is this the correct approach ?
Any hints are welcomed.
Best Answer
I find it hard to follow your calculations, partly because you didn’t introduce $\tau$ and your integrals don’t indicate their integration variables. Here’s one way to do this:
\begin{eqnarray} P(\text{2nd test $-$}\mid \text{1st test +}) &=& \sum_{\sigma\in\{+,-\}}P(\text{2nd test $-$}\mid\text{1st test +},\text{HIV}\sigma)P(\text{HIV}\sigma\mid\text{1st test +}) \\ &=& \sum_{\sigma\in\{+,-\}}P(\text{2nd test $-$}\mid\text{HIV}\sigma)P(\text{HIV}\sigma\mid\text{1st test +}) \\ &=& \sum_{\sigma\in\{+,-\}}P(\text{2nd test $-$}\mid\text{HIV}\sigma)\frac{P(\text{1st test +}\mid\text{HIV}\sigma)P(\text{HIV}\sigma)}{\sum_{\rho\in\{+,-\}}P(\text{1st test +}\mid\text{HIV}\rho)P(\text{HIV}\rho)} \\ &=& \frac{\sum_{\sigma\in\{+,-\}}P(\text{2nd test $-$}\mid\text{HIV}\sigma)P(\text{1st test +}\mid\text{HIV}\sigma)P(\text{HIV}\sigma)}{\sum_{\rho\in\{+,-\}}P(\text{1st test +}\mid\text{HIV}\rho)P(\text{HIV}\rho)}\;, \end{eqnarray}
where the first equality applies the law of total probability, the second equality applies your assumption of conditional independence of multiple tests, the third equality applies Bayes’ theorem to express $P(\text{HIV}\mu\mid\text{1st test +})$ in terms of known quantities, and the fourth equality is just a rearrangement of the sum.
Another way to get the same result is by applying the law of total probability to both the numerator and the denominator in
$$ P(\text{2nd test $-$}\mid \text{1st test +})=\frac{P(\text{2nd test $-$}\cap \text{1st test +})}{P(\text{1st test +})}\;. $$
Plugging in your variables yields
\begin{eqnarray} P(\text{2nd test $-$}\mid \text{1st test +}) &=& \frac{(1-\theta)\theta\lambda+(1-\eta)\eta(1-\lambda)}{\theta\lambda+\eta(1-\lambda)}\;. \end{eqnarray}