Solved – Can the union (or) probability of many non-mutually exclusive events be calculated recursively

combinatoricsprobability

I need to use the principle of inclusion/exclusion to calculate
the "OR" probability of a large number of events

$$ P( A_1 \cup A_2 \cup \dots \cup A_n ) $$

For two events the formula to use is (from Wikipedia http://en.wikipedia.org/wiki/Inclusion%E2%80%93exclusion_principle#In_probability) :

$$P(A_1\cup A_2)=P(A_1)+P(A_2)-P(A_1\cap A_2) $$.

For three events :

$$ P(A_1\cup A_2\cup A_3)=P(A_1)+P(A_2)+P(A_3) -P(A_1\cap A_2)-P(A_1\cap A_3)-P(A_2\cap A_3)+P(A_1\cap A_2\cap A_3) $$

For n events :

$$
P\biggl(\bigcup_{i=1}^n A_i\biggr) {} =\sum_{i=1}^n P(A_i)
-\sum_{i<j}P(A_i\cap A_j)
\qquad+\sum_{i<j<k}P(A_i\cap A_j\cap A_k)- \cdots\ +(-1)^{n-1}\, P\biggl(\bigcap_{i=1}^n A_i\biggr)
$$

The latter contains a very large number of terms, making it hard to compute. So I thought it might be possible to use a trick. If this works, I am surely not the first one to come up with this.

The idea, illustrated on 4-event example, is
$$ P( A_1 \cup A_2 \cup A_3 \cup A_4 ) $$
is equivalent to
$$ P( (A_1 \cup A_2) \cup (A_3 \cup A_4) )$$
is equivalent to
$$ P( A_1 \cup A_2) + P( A_3 \cup A_4) – P( A_1 \cup A_2) * P( A_3 \cup A_4) $$
.

The 2-event unions can be computed by the formula above.

This can be applied to $n$ events also. The algorithm always unifies 2 events to a new event, which is then combined with another unified event. So each step reduces the number of events to $n/2$ or $n/2+1$ if $n$ is odd. The procedure is repeated until a single union probability remains.

This makes it possible to reduce the required computational steps to $ O(log n) $ (or something like that).

I have tested this by numerically comparing the results of the procedure for 3 events and 4 events. It seems to work.

So my questions are: Is this wrong? And is there any literature reference on this approach?

Best Answer

As @HaoYe's comment points out, your recursion via divide-and-conquer is not quite right: it is not the case that $$P(A_1\cup A_2\cup A_3\cup A_4) = P( A_1 \cup A_2) + P( A_3 \cup A_4) - P( A_1 \cup A_2) * P( A_3 \cup A_4)$$ but rather that $$P(A_1\cup A_2\cup A_3\cup A_4) = P( A_1 \cup A_2) + P( A_3 \cup A_4) - P\left(( A_1 \cup A_2) \cap ( A_3 \cup A_4)\right).$$ In any case, the principle of inclusion/exclusion gives a very pretty formula that rarely can be used in practice because the probabilities of all those various intersections are not easy to determine. One case where the probabilities can be calculated is when the $n$ events are mutually independent, but in this special case, the general formula should not be used at all!

For $n$ mutually independent events $A_1, A_2, \ldots, A_n$, use DeMorgan's theorem to write $$P\left(\bigcup_{i=1}^n A_i\right) = 1 - P\left(\bigcap_{i=1}^n A_i^c\right) = 1 - \prod_{i=1}^n P(A_i^c)= 1 - \prod_{i=1}^n \left[1 - P(A_i)\right] \tag{1}$$ and calculate $P(A_1\cup A_2\cup\cdots\cup A_n)$ using $n-1$ multiplications and $n+1$ subtractions. In other words, for Heaven's sake, resist the temptation to multiply out those terms in square brackets on the right because you will end up with the inclusion/exclusion formula which you should try to avoid at all costs.

Related Question