Probability – Calculate Probability of Tossing a Fair Coin with at Least k Consecutive Heads

probability

Tossing a fair coin for $N$ times and we get a result series as $HTHTHHTT\dots~$, Here '$H$' denotes 'head' and '$T$' denotes 'tail' for a specific tossing each time.

What is the probability that the length of the longest streak of consecutive heads is greater than or equal to $k$? (that is we have a $HHHH\dots~$, which is the substring of our tossing result, and whose length is greater than or equal to $k$)

I came up with a recursive solution (though not quite sure), but cannot find a closed form solution.

Here is my solution.

Denote $P(N,k)$ as the probability for tossing the coin $N$ times, and the longest continuous heads is greater or equal than $k$.
Then (For $N>k$)

$$
P(N,k)=P(N-1,k)+\Big(1-P(N-k-1,k)\Big)\left(\frac{1}{2}\right)^{k+1}
$$

Best Answer

We can derive an explicit formula of the probability $P(N,k)$ based upon the Goulden-Jackson Cluster Method.

We consider the set of words $\mathcal{V}^{\star}$ of length $N\geq 0$ built from an alphabet $$\mathcal{V}=\{H,T\}$$ and the bad word $\underbrace{HH\ldots H}_{k \text{ elements }}=:H^k$ which is not allowed to be part of the words we are looking for. We derive a function $f_k(s)$ with the coefficient of $s^N$ being the number of wanted words of length $N$.

The wanted probability $P(N,k)$ can then be written as \begin{align*} P(N,k)=1-\frac{1}{2^N}[s^N]f_k(s) \end{align*}

According to the paper (p.7) from Goulden and Jackson the generating function $f_k(s)$ is \begin{align*} f_k(s)=\frac{1}{1-ds-\text{weight}(\mathcal{C})}\tag{1} \end{align*} with $d=|\mathcal{V}|=2$, the size of the alphabet and with $\mathcal{C}$ the weight-numerator with \begin{align*} \text{weight}(\mathcal{C})=\text{weight}(\mathcal{C}[H^k]) \end{align*} We calculate according to the paper \begin{align*} \text{weight}(\mathcal{C}[H^k])&=-\frac{s^k}{1+s+\cdots+s^{k-1}}=-\frac{s^k(1-s)}{1-s^k} \end{align*}

We obtain the generating function $f(s)$ for the words built from $\{H,T\}$ which don't contain the substring $H^k$ \begin{align*} f_k(s)&=\frac{1}{1-2s+\frac{s^k(1-s)}{1-s^k}}\\ &=\frac{1-s^k}{1-2s+s^{k+1}}\tag{2}\\ \end{align*}

$$ $$

Note: For $k=2$ we obtain \begin{align*} f_2(s)&=\frac{1-s^2}{1-2s+s^{3}}\\ &=1+2s+3s^2+5s^3+8s^4+13s^5+21s^6+34s^7+\mathcal{O}(s^8) \end{align*}

The coefficients of $f_2(s)$ are a shifted variant of the Fibonacci numbers stored as A000045 in OEIS.

Note: For $k=3$ we obtain \begin{align*} f_3(s)&=\frac{1-s^3}{1-2s+s^{4}}\\ &=1+2s+4s^2+7s^3+13s^4+24s^5+44s^6+81s^7+\mathcal{O}(s^8) \end{align*}

The coefficients of $f_3(s)$ are a shifted variant of the so-called Tribonacci numbers stored as A000073 in OEIS.

$$ $$

We use the series representation of $f_k(s)$ in (2) to derive an explicit formula of the coefficients.

\begin{align*} [s^N]f(s)&=[s^N](1-s^k)\sum_{m=0}^{\infty}(2s-s^{k+1})^m\\ &=[s^N](1-s^k)\sum_{m=0}^{\infty}s^m(2-s^k)^m\\ &=[s^N](1-s^k)\sum_{m=0}^{\infty}s^m\sum_{j=0}^m\binom{m}{j}(-1)^js^{kj}2^{m-j}\\ &=([s^N]-[s^{N-k}])\sum_{m=0}^{\infty}s^m\sum_{j=0}^m\binom{m}{j}(-1)^js^{kj}2^{m-j}\tag{3}\\ &=\sum_{m=0}^{N}([s^{N-m}]-[s^{N-k-m}])\sum_{j=0}^m\binom{m}{j}(-1)^js^{kj}2^{m-j}\tag{4}\\ &=\sum_{{m=0}\atop{m\equiv N(\bmod k)}}^{N}\binom{m}{\frac{N-m}{k}}(-1)^{\frac{N-m}{k}}2^{m-\frac{N-m}{k}}\\ &\qquad-\sum_{{m=0}\atop{m\equiv N(\bmod k)}}^{N-k}\binom{m}{\frac{N-m}{k}-1}(-1)^{\frac{N-m}{k}-1}2^{m-\frac{N-m}{k}+1}\\ &=\sum_{{m=0}\atop{m\equiv N(\bmod k)}}^{k-1}\binom{m}{\frac{N-m}{k}}(-1)^{\frac{N-m}{k}}2^{m-\frac{N-m}{k}}\tag{5}\\ &\qquad+\sum_{{m=k}\atop{m\equiv N(\bmod k)}}^{N-k} \left(\binom{m}{\frac{N-m}{k}}-\frac{1}{2^k}\binom{m-k}{\frac{N-m}{k}}\right)(-1)^{\frac{N-m}{k}}2^{m-\frac{N-m}{k}}\\ \end{align*}

Comment:

  • In (3) we use the linearity of the coefficient of operator and $[s^N]s^mf(s)=[s^{N-m}]f(s)$

  • In (4) we change the limit of the left hand sum from $\infty$ to $N$ according to the maximum coefficient $[s^N]$. According to the factors $s^{kj}$ we consider in the following only summands with $m\equiv N(\bmod k)$

  • In (5) we reorganise the sums from the line above by extracting from the left hand sum the first summand and shifting in the right hand side the index by one and putting both sums together.

We conclude: An explicit representation of the probability $P(N,k)$ $(n\geq 0)$ is according to (5) \begin{align*} P(N,k)&=1-\sum_{{m=0}\atop{m\equiv N(\bmod k)}}^{k-1}\binom{m}{\frac{N-m}{k}}(-1)^{\frac{N-m}{k}}2^{-\frac{(k+1)(N-m)}{k}}\\ &\qquad-\sum_{{m=k}\atop{m\equiv N(\bmod k)}}^{N-k} \left(\binom{m}{\frac{N-m}{k}}-\frac{1}{2^k}\binom{m-k}{\frac{N-m}{k}}\right)(-1)^{\frac{N-m}{k}}2^{-\frac{(k+1)(N-m)}{k}} \end{align*}

Related Question