Well, you should care about rates and Poisson processes! If the bird songs are independent of each other and of time, then you can assume a Poisson process with a stable rate of, say, $\lambda$ songs per hour. The Poisson distribution tells you that the probability of no song during one hour is
$$
P(X=0)={\lambda^0 e^{-\lambda}\over 0!}=e^{-\lambda}.
$$
If this probability is equal to .25 then you can deduce the rate $\lambda$ as follows:
$$
e^{-\lambda}=1/4 \rightarrow \lambda=-\ln(1/4)=\ln4\approx 1.38629
\text{ song per hour.}
$$
If you are interested in what happens over $x$ minutes, then the expected number of songs will be $x\ln4/60$ and the probability of detecting (i.e., at least one song over $t$ minutes) will be
$$
P(X\geq1)=1-P(X=0)=1-{(t\ln4/60)^0 e^{-t(\ln4)/60}\over 0!}=e^{-t(\ln4)/60}.
$$
For a 1-minute interval, this is
$$
1-e^{-(\ln4)/60}\approx 0.02284.
$$
and for a 90-minute interval
$$
1-e^{-90(\ln4)/60}\approx 0.875.
$$
You are correct. But so is the paper.
The paper explains it like this:
The probability of a success run of length at least $\mathscr{l}$ immediately following a particular failure is $p^{\mathscr{l}}$.
Let $N_{\mathscr{l}}$ be the number of success runs of length of length $\mathscr{l}$ or longer in $n$ Bernoulli trials.
If we condition on the number of failures $n_F$ in the sequence (treat it as given), we obtain
\begin{eqnarray}
P(L_n=\mathscr{l}) & = & P(N_{\mathscr{l}} > 0, N_{\mathscr{l}+1}=0 | n_F) \\
& = & P(N_{\mathscr{l}+1}=0 | n_F) - P(N_{\mathscr{l}}=0 | n_F) \\
& = & (1-p^{\mathscr{l}+1})^{n_F} - (1-p^{\mathscr{l}})^{n_F}
\approx e^{-n_F p^{\mathscr{l}+1}} - e^{-n_F p^{\mathscr{l}}}.
\end{eqnarray}
Let's consider an example.
Suppose we consider random sequences of a million digits, and we are curious about runs of the digit 3.
So we consider the digit 3 as a "success", and the other digits as "failures", so $p=0.1$, $q=0.9$, and $n=10^6$.
We expect that there will be roughly $10^5$ successes and $9 \cdot 10^5$ failures.
Since $n$ is large, as the paper says, "we can't go very wrong" by assuming that there are exactly $10^5$ successes and $9 \cdot 10^5$ failures, and then looking at what is the longest run of 3s that we should expect in that specific case.
For example, what is the probability that the longest run of 3s has exactly six 3s in it (i.e., $\mathscr{l}=6$)?
Well, after each of the $9 \cdot 10^5$ non-3s, we will have 0 or more 3s before the next non-3.
The probability of having seven or more in a row is just the probability that the next seven digits are 3, which is $p^7$, so the probability of having 6 or fewer is $1-p^7$.
From this, the probability that all $9\cdot10^6$ runs have length 6 or less is $(1-p^7)^{9\cdot10^5} \approx 0.914$.
Similarly, the probability that all $9\cdot10^6$ runs have length 5 or less is $(1-p^6)^{9\cdot10^5} \approx 0.407$.
Subtracting these gives us $0.507$, the probability that the maximum run length is exactly 6.
This also illustrates one of the main points of the paper, that the distribution for the maximal run length is (counterintuitively) highly concentrated.
In this case the chance is over 50% that the maximal run of 3s will be of length exactly 6.
Notice that this derivation is sloppy in the following respect:
We conditioned on $n_F$ so that we could easily express $P(\mbox{all runs are length 6 or less}) = (1-p^7)^{n_F}$ as an exponentiation with exponent $n_F$.
But we conveniently forgot about this condition when deriving the base of this exponentiation: $1-p^7$ is only exact when we are not conditioning on $n_F$.
If we are conditioning on $n_F$, we get a more complicated expression, because we are drawing samples without replacement.
Furthermore, the exponentiation ceases to be correct, because the lengths of the runs of 3s are not independent.
You were right to notice that these formulas are not exactly correct.
But they are approximately correct when $n$ is large.
In the third line of equations, the paper should have started the line with $"\approx"$ instead of $"="$.
To answer your final question, the approximation doesn't care if the sequence starts with $F$.
If it doesn't start with $F$, then we are ignoring the first run of $T$s.
We are computing a term like $0.9999999^{900000}$, so it makes almost no difference to ignore a run (or even a few, if $n_F$ was not exactly $q \cdot n$), since this only changes the exponent by one (or a small amount), and so the result of the exponentiation is roughly unchanged.
(This is a case where using concrete numbers can help us understand what the author was thinking.)
The main point is that these formulas in the paper are quick approximations.
The paper plays fast and easy with these formulas, because it is oriented towards applications, not theorems.
For the types of situations it is considering, with $n \gg 0$, the approximations made in the paper are perfectly reasonable.
Best Answer
I take it that the statement, "the probability of successful operation for a one-hour time interval is $0.999.$," is intended to mean any one-hour interval. Realistically, this may not be true, but this is just a simple example, the first example after the concept is introduced.
Your statement that independent events should have different probabilities is a little puzzling. If the events are successive tosses of a coin, for example, they are all independent, and the probability that heads comes up is the same on every toss. Independent, identically-distributed events are a staple of probability theory.
In this case, the answer is $f(t)=.999^t$, assuming independence. We need independence in or to say, for example that the probability that it runs for two hours is the probability that it doesn't fail in the first hour times the probability that it doesn't fail in the second hour.