As @HaoYe's comment points out, your recursion via divide-and-conquer is not
quite right: it is not the case that
$$P(A_1\cup A_2\cup A_3\cup A_4) = P( A_1 \cup A_2) + P( A_3 \cup A_4) - P( A_1 \cup A_2) * P( A_3 \cup A_4)$$
but rather that
$$P(A_1\cup A_2\cup A_3\cup A_4) = P( A_1 \cup A_2) + P( A_3 \cup A_4) - P\left(( A_1 \cup A_2) \cap ( A_3 \cup A_4)\right).$$
In any case, the principle of inclusion/exclusion gives a very pretty formula
that rarely can be used in practice because the probabilities of all those
various intersections are not easy to determine. One case where the probabilities
can be calculated is when the $n$ events are mutually independent, but
in this special case, the general formula should not be used at all!
For $n$ mutually independent events $A_1, A_2, \ldots, A_n$, use DeMorgan's theorem to write
$$P\left(\bigcup_{i=1}^n A_i\right) = 1 - P\left(\bigcap_{i=1}^n A_i^c\right)
= 1 - \prod_{i=1}^n P(A_i^c)= 1 - \prod_{i=1}^n \left[1 - P(A_i)\right]
\tag{1}$$
and calculate $P(A_1\cup A_2\cup\cdots\cup A_n)$ using $n-1$ multiplications
and $n+1$ subtractions. In other words, for Heaven's sake, resist the
temptation to multiply out those terms in square brackets on the right
because you will end up with the inclusion/exclusion formula which you
should try to avoid at all costs.
The issue isn't a flaw in reasoning, but the parameter values chosen for the event probabilities.
It's simplest to start with your 1-event model, with a Bernoulli trial at each time point and thus a geometric distribution of waiting times until the event. With the value of 0.025 per time unit, the incidence rate would be highest in the first time unit and drop off with increasing time, as you point out. When multiple events are required, your model then gives non-monotonic incidence-age curves, which as you note in a comment can be observed for some cancers.
So what's the time unit?
The best general estimates of mutation rates in humans are about $10^{-7}$ per gene per cell division. This number itself represents a combination of a mutation, a failure of the cell to correct the mutation, and the survival of the cell despite the mutation. Furthermore, not all mutations, even in cancer-associated genes, promote cancer. There is some suggestion that the normal tissue stem cells in which mutations might be most likely to lead to cancer have even lower mutation rates. (Then again, a mutation in certain genes is likely to increase the probability of future mutations in the same cell.)
Frank and Nowak examine how these and other factors combine to lead to accumulation of cancer-related mutations in the context of human tissue biology, including cell-division times, tissue architecture, and changes in effective mutation rates during tumor development.
So a time unit required to obtain your assumed 0.025 probability of mutation per time unit would have to be very long. When working in time units of years most investigators assume the limiting case of a Poisson process, with a low constant occurrence rate (for a particular cell type in a particular environment with a particular history) so that the probability of occurrence is proportional to elapsed time. That's what Armitage and Doll did, although they didn't use that terminology.
Also, be very careful in reading and thinking about mutation rates. Some mutation rates are specified as per base-pair of DNA, some as per genome. That's a factor of several billion difference. Time scales can be per cell division, per year, per generation, per lifetime. Some so-called mutation "rates" in cancer genomic studies are simply the numbers of accumulated mutations per megabase of DNA in a tumor at the time of analysis. Don't jump to conclusions until you know what type of "mutation rate" is being described.
The moral here is that it's quite possible to get interesting results from a model of carcinogenesis, but you have to make sure that your choices of parameter values are realistic. That's often the major effort in modeling. That's how, for example faced with a non-monotonic incidence versus age curve for some type of cancer, you might distinguish your model (high-probability per unit time) from the model (some humans susceptible to cancer, others not) described in the "Cancer Anomaly" article you cite, or models involving competing risks (like heart attacks) or poor diagnosis in the elderly leading to underestimated cancer risks at older ages.
Best Answer
No, but you can conclude that the probability of any shared events is zero.
Disjoint means that $A_i \cap A_j=\emptyset$ for any $i\ne j$. You cannot conclude that, but you can conclude that $P(A_i \cap A_j)=0$ for all $i\ne j$. Any shared elements must have probability zero. Same goes for all higher-order intersections as well.
In other words, you can say, with probability 1, that none of the sets can occur together. I have seen such sets called almost disjoint or almost surely disjoint but such terminology is not standard I think.