The issue isn't a flaw in reasoning, but the parameter values chosen for the event probabilities.
It's simplest to start with your 1-event model, with a Bernoulli trial at each time point and thus a geometric distribution of waiting times until the event. With the value of 0.025 per time unit, the incidence rate would be highest in the first time unit and drop off with increasing time, as you point out. When multiple events are required, your model then gives non-monotonic incidence-age curves, which as you note in a comment can be observed for some cancers.
So what's the time unit?
The best general estimates of mutation rates in humans are about $10^{-7}$ per gene per cell division. This number itself represents a combination of a mutation, a failure of the cell to correct the mutation, and the survival of the cell despite the mutation. Furthermore, not all mutations, even in cancer-associated genes, promote cancer. There is some suggestion that the normal tissue stem cells in which mutations might be most likely to lead to cancer have even lower mutation rates. (Then again, a mutation in certain genes is likely to increase the probability of future mutations in the same cell.)
Frank and Nowak examine how these and other factors combine to lead to accumulation of cancer-related mutations in the context of human tissue biology, including cell-division times, tissue architecture, and changes in effective mutation rates during tumor development.
So a time unit required to obtain your assumed 0.025 probability of mutation per time unit would have to be very long. When working in time units of years most investigators assume the limiting case of a Poisson process, with a low constant occurrence rate (for a particular cell type in a particular environment with a particular history) so that the probability of occurrence is proportional to elapsed time. That's what Armitage and Doll did, although they didn't use that terminology.
Also, be very careful in reading and thinking about mutation rates. Some mutation rates are specified as per base-pair of DNA, some as per genome. That's a factor of several billion difference. Time scales can be per cell division, per year, per generation, per lifetime. Some so-called mutation "rates" in cancer genomic studies are simply the numbers of accumulated mutations per megabase of DNA in a tumor at the time of analysis. Don't jump to conclusions until you know what type of "mutation rate" is being described.
The moral here is that it's quite possible to get interesting results from a model of carcinogenesis, but you have to make sure that your choices of parameter values are realistic. That's often the major effort in modeling. That's how, for example faced with a non-monotonic incidence versus age curve for some type of cancer, you might distinguish your model (high-probability per unit time) from the model (some humans susceptible to cancer, others not) described in the "Cancer Anomaly" article you cite, or models involving competing risks (like heart attacks) or poor diagnosis in the elderly leading to underestimated cancer risks at older ages.
I couldn't follow your derivation, but I believe the correct way to solve the problem is as follows.
First, some notation: take $x_i$ to be the expected number of flips to reach the end state ($n$ consecutive heads or $n$ consecutive tails), given that we're on a "run" of exactly $i$ consecutive flips of heads. Similarly, take $y_i$ to be the expected number of flips to reach the end state, given that we're on a run of exactly $i$ consecutive flips of tails. Clearly, $x_n = y_n = 0$, since if we've already had $n$ consecutive heads or tails, we're done.
Then, we're interested in finding $x_0$ ($= y_0$). The relevant equations are (for $i < n$):
\begin{equation}
\begin{split}
x_i & = (1-p)(1 + y_1) + p(1 + x_{i+1}) \\
y_i & = p(1 + x_1) + (1-p)(1 + y_{i+1})
\end{split}
\end{equation}
Let's derive the first equation (the second one is analogous). Suppose we're on a run of $i$ heads. We want to find the expected number of flips to end the game, which is $x_i$. Let's take our next flip. There's a probability $p$ that it comes up heads, which means we now have a run of $i+1$ heads. That's the second term on the right-hand-side of the equation: there's a probability of $p$ of moving from the state with $i$ consecutive heads to the state with $i+1$ consecutive heads, and it costs 1 flip in order to do so. There's also a probability $1-p$ that it comes up tails, in which case we have a run of only 1 tail. That's the first term on the right-hand-side of the equation: there's a probability of $1-p$ of moving from the state with $i$ consecutive heads to the state with 1 consecutive tail, and it costs us 1 flip in order to do so.
With these equations you can solve the problem for any $n$. I tried implementing this in Mathematica, and if $p = 0$ or $p = 1$, I indeed get that $x_0 = y_0 = n$, as expected. I'm not sure if there's any closed form solution to the equations for arbitrary $n$; that's probably a question for the Math StackExchange. (Edit: see derivation below for the solution for arbitrary $n$).
What I've sketched out here is a general approach for solving many types of coin flipping problems (and other problems). It essentially constructs a Markov chain that has states (e.g., 3 consecutive heads), and transition probabilities for moving between states (e.g., a probability of $1-p$ of moving from 3 consecutive heads to 1 tail). The absorbing states are the end states of the game. See, for instance, the answer to this question.
EDIT: For the particular case of $n=5$, I get (using Mathematica):
\begin{equation}
x_0 = \frac{(5 - 10p + 10p^2 - 5p^3 + p^4)(1 + p + p^2 + p^3 + p^4)}{1 - 4p + 6p^2 - 4p^3 + p^4 + 4p^5 - 6p^6 + 4p^7 - p^8}
\end{equation}
which seems to have the right limits (although it's possible I have a typo somewhere).
Edit #2: Following @whuber's advice, I solved the recurrence relation for general $n$. The solution is:
\begin{equation}
\begin{bmatrix}
x_0 \\ y_0
\end{bmatrix}
=
(A + B)[(I-A^{n-1})^{-1}(I-A)-B]^{-1}c + c
\end{equation}
where
\begin{equation}
A =
\begin{bmatrix}
p & 0 \\
0 & 1-p
\end{bmatrix}
\end{equation}
and
\begin{equation}
B =
\begin{bmatrix}
0 & 1-p \\
p & 0
\end{bmatrix}
\end{equation}
and
\begin{equation}
c =
\begin{bmatrix}
1 \\ 1
\end{bmatrix}
\end{equation}
and $I$ is the identity matrix.
For $n=5$, this gives the same result as above.
Best Answer
Generalizing the problem, let $X$ be the random variable for the number of trials it takes to achieve $r$ successes in a row, with $p=$ the probability of success.
The probability of having no successes in a row at time $t$, for $t\ge 1$, is the probability of a failure at time $t$ preceeded by a sequence to time $t-1$ that has not yet had $r$ successes, which is: $$(1-p)(1-P(X\le t-1))$$
The probability of having $r$ successes at time $t+r$ following no successes at time $t$ is: \begin{align} P(X=t+r) &= p^r(1-p)(1-P(X\le t-1)) \\ P(X=t+r+1) &= p^r(1-p)(1-P(X\le t)) \end{align} Then taking sums from $t=0$ to $\infty$: \begin{align} \sum_{t=0}^\infty P(X=t+r+1) &= \sum_{t=0}^\infty p^r(1-p)(1-P(X\le t))\\ \sum_{t=0}^\infty P(X=t) - \sum_{t=0}^r P(X=t) &= p^r(1-p)\sum_{t=0}^\infty (1-P(X\le t))\\ \end{align}
Finally using the facts that:
we have
$$E(X) = \frac1{p^r(1-p)}(1-p^r)$$
For $r=1$, as expected this simplifies to the same formula as for the geometric distribution: $E(X) = 1/p$.
The cases
confirm your simulations.