Because heads and tails play symmetric roles in the stopping criterion (and presumably have equal chances in each Bernoulli trial), it suffices to find the probability of getting two consecutive heads before the process stops.
If the process stops with four consecutive heads, obviously that means also we got two consecutive heads before the process stops. So we can focus on the probability $p$ that we get two consecutive heads before the process stops with four consecutive tails.
One way to compute this is by defining a Markov chain with two absorbing states, a) two consecutive heads and b) four consecutive tails. Finding $p$ amounts to finding the probability of reaching the first of these absorbing states (two consecutive heads).
The Wikipedia write-up of
absorbing Markov chains
may be overly concise, so here are some details. Ordering the
transient states before the absorbing states gives a
probability transition matrix having block structure:
$$ P = \begin{bmatrix} Q & R \\ 0 & I \end{bmatrix}$$
so that $Q$ gives the transition probabilities between the
transient states and $R$ the transition probabilities from
transient to absorbing states.
We define the fundamental matrix
$N = \sum_{k=0}^\infty Q^k = (I-Q)^{-1}$, and the product
$NR$ then gives the probabilities of eventually reaching an
absorbing state starting from a transient state.
In the case at hand it is convenient to use four transient
states (consecutive runs of one Head or of one to three
Tails, resp.) and two absorbing states (two consecutive
Heads or four consecutive Tails). Taking the states in
just this order gives:
$$ P = \begin{bmatrix} 0 & \frac{1}{2} & 0 & 0 & \frac{1}{2} & 0
\\ \frac{1}{2} & 0 & \frac{1}{2} & 0 & 0 & 0
\\ \frac{1}{2} & 0 & 0 & \frac{1}{2} & 0 & 0
\\ \frac{1}{2} & 0 & 0 & 0 & 0 & \frac{1}{2}
\\ 0 & 0 & 0 & 0 & 1 & 0
\\ 0 & 0 & 0 & 0 & 0 & 1 \end{bmatrix} $$
$$ N = (I-Q)^{-1} = \begin{bmatrix}
\frac{16}{9} & \frac{8}{9} & \frac{4}{9} & \frac{2}{9}
\\ \frac{14}{9} & \frac{16}{9} & \frac{8}{9} & \frac{4}{9}
\\ \frac{4}{3} & \frac{2}{3} & \frac{4}{3} & \frac{2}{3}
\\ \frac{8}{9} & \frac{4}{9} & \frac{2}{9} & \frac{10}{9}
\end{bmatrix} $$
$$ NR = \begin{bmatrix} \frac{8}{9} & \frac{1}{9}
\\ \frac{7}{9} & \frac{2}{9}
\\ \frac{2}{3} & \frac{1}{3}
\\ \frac{4}{9} & \frac{5}{9} \end{bmatrix} $$
For economy I omitted an "empty" state, so we imagine our
Markov process to initialize in state one Head with probability
$1/2$ and likewise in state one Tail with probability $1/2$.
From the above computation it follows that our chance of
reaching two consecutive Heads before four consecutive Tails is:
$$ (1/2)\frac{8}{9} + (1/2)\frac{7}{9} = \frac{5}{6} $$
This is the same as our chance of reaching two consecutive
Tails before four consecutive Heads.
A different approach: consider two independent geometric vars $X_1$ ,$X_2$, each of which measures the amount of trials until getting a success, in an experiment with prob. of success $p_1$ (resp $p_2)$
Then
$$P(X_1 \le X_2)=p_1 + p_1 q_1 q_2 +p_1 (q_1 q_2)^2+\cdots=\frac{p_1}{1- q_1 q_2} $$
$$P(X_1 < X_2)=p_1 q_2 + p_1 q_2 q_1 q_2 +p_1 q_2 (q_1 q_2)^2+\cdots=\frac{p_1 q_2}{1- q_1 q_2} $$
We can consider each run of tails/heads such experiments, with $p_1=1/2^{5-1}=2^{-4}$, $p_2 =1/2$
Let $E$ be the desired event (run of 5 tails happens before run of 2 heads). Let $T$ be the event that the first coin is a tail. Then
$$P(E)=P(E|T)P(T)+P(E|T^c)P(T^c)=\\
=\frac{p_1}{1- q_1 q_2} \frac{1}{2}+\frac{p_1 q_2}{1- q_1 q_2} \frac{1}{2}=\\=\frac{1}{2} (1+q_2)\frac{p_1}{1- q_1 q_2} =\frac{3}{34} $$
In general
$$P(E)=\frac{2^h-1}{2^t+2^h-2}$$
Seeing that the final formula is so simple, I wonder if there is a simpler derivation.
Best Answer
It might be better to say:
The situation is symmetric in the sense that it doesn't matter what sides of the coin you're talking about --- heads or tails. But it does matter that the runs of $a$ come before the runs of $b$.
Intuitively, $a$ and $b$ are not symmetric because if you increase $a$ (the number of runs you have to get first), you expect the likelihood of success to go down. If you increase $b$ (the number of runs you have to avoid getting before succeeding at $a$), you expect the likelihood of success to go up:
It is rather likely to get a run of 2 before your first run of 1000. It is rather unlikely to get a run of 1000 before your first run of 2. This is true regardless of whether you're talking about heads/tails or tails/heads.
Let's generalize to weighted coins. Suppose you want to know the probability of getting $a$ runs of one side of the coin before getting $b$ runs of the other. Suppose the one side of the coin comes up with probability $\alpha$, and the other side of the coin comes up with probability $\beta = 1-\alpha$.
Then, following that post:
$$\begin{align*} P(E) &= P(E|T)P(T) + P(E|T^C)P(T^C)\\ & = \frac{p_1}{1-q_1q_2}\alpha + \frac{p_1q_2}{1-q_1q_2}(1-\alpha)\\ & = [q_2 + (1-q_2)\alpha] \frac{p_1}{1-q_1q_2} \end{align*}$$
and here $p_1$ and $q_1$ are the probability of succesfully/unsuccessfully getting a run of $a$, and similarly for $p_2, q_2$.
We have $p_1 = \alpha^{a-1}$, $p_2 = (1-\alpha)^{b-1}$, $q_1 = 1-p_1$, $q_2 = 1-p_2$.
$$\begin{align*} P(E) & = [q_2 + (1-q_2)\alpha] \frac{p_1}{1-q_1q_2} \\ &= [1-(1-\alpha)^{b-1} + (1-\alpha)^{b-1}\alpha] \frac{p_1}{1-q_1q_2}\\ &= [1 + (\alpha-1)(1-\alpha)^{b-1}] \frac{p_1}{1-q_1q_2}\\ &= [1 - (1-\alpha)^{b}]\frac{\alpha^{a-1}}{1-(1-\alpha^{a-1})(1-(1-\alpha)^{b-1})} \end{align*}$$