Confidence intervals calculated from other confidence intervals (binomial problem)

binomial distributionconfidence intervalprobability

In a binomial experiment, I have an estimate for the probability of 3 independent events A, B & C, each with a 95% confidence interval.

(Trivial example values)

P(A) = .12 (.05, .29)
P(B) = .16 (.08, .25)
P(C) = .06 (.02, .14)

I need to calculate P (no event) = P (no A) * P (no B) * P (no C)

which is (1 - P(A)) * (1 - P(B)) * (1 - P(B)), or (1 - .12) * (1 - .16) * (1 - .06).

Now, my question arises when I do the same calculation using the lower and upper bounds of the confidence intervals to calculate a CI around P (no success). It seems logical to do it, but I know that in some circumstances, you can't just add or subtract lower or upper bounds of C.I.'s without affecting the width, or rather the confidence level of your newly calculated interval. (Adding two 95% C.I.'s would lead to a close to 98% C.I., I've read somewhere recently).

I'm just not sure if this is one of those circumstances, and if it is, how do I find / calculate the proper confidence level (85%? 90%?) to use in the first step in order to end up with a truly 95% C.I. at the end?

EDIT:
This is an epidemiological study. Sample proportions for A, B, and C were obtained from the same sampled individuals; however, the three events are assumed independent (finding A does not impact chance of finding B in the same individual).

Best Answer

  • You have estimates $\hat{q}_a$, $\hat{q}_b$ and $\hat{q}_c$, which are (presumably) approximate independent estimates of the probabities for the independent events 'no A', 'no B' and 'no C'.

  • You have related standard errors for these estimates (which can be derived from confidence intervals).

  • You want to compute an estimate for the product $q = q_aq_bq_c$, the probability of neither A, B and C, assuming a model where they are independent.

You can estimate this by $$\hat{q} = \hat{q}_a\hat{q}_b\hat{q}_c$$

For the standard deviation, and associated confidence interval, you can use as approximation of propagation of errors the formula for the variance of independent variables when they are multiplied.

$$\sigma_{XYZ}^2 = \mu_{X}^2 \mu_{Y}^2 \sigma_{Z}^2 + \mu_{X}^2 \sigma_{Y}^2 \mu_{Z}^2 + \sigma_{X}^2 \mu_{Y}^2 \mu_{Z}^2 + \mu_{X}^2 \sigma_{Y}^2 \sigma_{Z}^2 + \sigma_{X}^2 \mu_{Y}^2 \sigma_{Z}^2 + \mu_{X}^2 \sigma_{Y}^2 \sigma_{Z}^2 + \sigma_{X}^2 \sigma_{Y}^2 \sigma_{Z}^2$$


Simulation

I did a simulation when $n=100$ and $p_a=p_b=p_c=0.5$, and interestingly, computing $\hat{q}$ indirectly via $\hat{q}_a\hat{q}_b\hat{q}_c$ leads to a smaller variance of the estimate, in comparison to using the raw data directly (counting the cases no a, no b and no c). It is because we are using effectively more data, 300 datapoints instead of 100.

simulation for difference of methods

So the indirect estimate using the product $\hat{q} = \hat{q}_a\hat{q}_b\hat{q}_c$ has less variance than using counts of the events directly. But, potentially it might biased when the events a,b,c are not truly independent.

Related Question