Probability of Long Run of 1s in Random Binary String

probabilityrecurrence-relations

For some fixed $n$, let $p_n$ be the probability that a random infinite binary string contains a run of consecutive $1$s, containing $n$ more $1$s than the total number appearing before the run.

For example, if $n=2$, the binary string $010010111100…$ is of the sort we're looking for, but the string $01001011100…$ is not (yet). Call a run of $1$s that is sufficiently long, like the four $1$s in the first string, a satisfying run.

Is there any reasonably nice way to compute $p_n$?

Here's what I know:

  1. We can compute the expected number of satisfying runs in a string. Suppose a satisfying run begins after $k$ preliminary bits. If $k=0$, this occurs with probability $2^{-n}$; otherwise, it occurs with probability $\sum_{i=0}^{k-1} {k-1 \choose i} 2^{-n-k-i}=2^{-n-k}\left(\frac{3}{2}\right)^k$. Summing over all $k$, we get that the expected number of satisfying runs is $3(2^{-n})$, and so $p_n<3(2^{-n})$. You could extend this argument to compute $p_n$ by inclusion-exclusion, but it looks to me like it'd be incredibly ugly. In fact, $3(2^{-n})$ is a pretty good estimate for $p_n$, since the probability of a given string having multiple satisfying runs is so low.

  2. By conditioning on the location of the first $0$ in the string, we can get a recurrence relation: $p_n=2^{-n+1}+\sum_{i=1}^{n-1} 2^{-i+1}p_{n+i}$. This is not sufficient to compute $p_n$ on its own, even given some initial values — the largest index that appears in it is $p_{2n-1}$, so if you try to use it recursively you'll never learn anything about $p_n$ for $n$ even. You might be able to get something out of it by repeatedly substituting it into itself (in the form I've given it in, with the smallest coefficient singled out), and using the bound from 1. to make some kind of limit argument, but this also looks nasty to me.

Thoughts?

(This originally comes from this Magic: the Gathering scenario, but I hope I've managed to successfully de-Magic it.)

Best Answer

Slight trivial change in notation: Let $P_n$ be the probability that every run of consecutive 1s in our string does not exceed $n$ plus the previous acumulated runs.

Then, the following linear recursion (equivalent to that of your second approach) holds:

$$P_n = \sum_{k=1}^n P_{n+k} \; 2^{-k} \hspace{12mm} n=1, 2 \ldots \tag{1} $$

(this is obtained by summing over all the possible lengths of the first run; because, after the first run of length $k$, we have the equivant problem with $n$ replaced by $k+n$)

This recursion is rather tricky to solve, because our "initial value" would be $P_{\infty}=1$, and the recursion is not time-reversible. Here's my approach (perhaps there are more elegant procedures, I'm not really familiar with this).

I try a solution of the form:

$$P_n = 1 + a_1 2^{-n} + a_2 2^{-2n} + a_3 2^{-3n} + \cdots$$

Actually, we'll only keep the terms with exponents of the form $-(2^p-1)n$. That is, we set $a_i=0$ unless $i = 2^p-1$ for some integer $p$. Calling $Q(r, n) = 2^{-r \, n}$, we have:

$$P_n = Q(0,n) + a_1 \, Q(1 , n) + a_3 \, Q(3,n) + a_7 \, Q(7,n) + a_{15} \, Q(15,n) +\cdots$$

Now, it's straighfroward to check (geometric sum) that:

$$\sum_{k=1}^n Q(r,n+k) \; 2^{-k} = \frac{1}{2^{r+1}-1} \bigl( Q(r,n) - Q(2r+1,n) \bigr) $$

Hence, the recursion (1) induces the transformations $Q_0 \to Q_0 -Q_1$, $Q_1 \to \frac{1}{3}(Q_1 -Q_3)$, $Q_3 \to \frac{1}{7}(Q_3 -Q_7)$, etc, and thus the coefficients can be obtained:

$$\begin{align} a_1 &= \frac{a_1}{3} - 1 \Rightarrow a_1 = -\frac{3}{2} \\ a_3 &= \frac{a_3}{7} - \frac{a_1}{3} \Rightarrow a_3 = \frac{1}{3} \frac{7}{6} \frac{3}{2}=\frac{7}{6 \times 2}\\ a_7 &= \frac{a_7}{15} - \frac{a_3}{7} \Rightarrow a_7 = - \frac{1}{7} \frac{15}{14} \frac{1}{3} \frac{7}{6} \frac{3}{2} =-\frac{15}{14 \times 6 \times 2} \\ a_{15} &= \cdots = \frac{31}{30 \times 14 \times 6 \times 2}\\ a_{2^p-1} &= \cdots = (-1)^p \frac{2^{p-1}-1}{(2^{p-1}-2) \times (2^{p-2}-2) \cdots \times 14 \times 6 \times 2}\\ \end{align} $$

Or, more compact (perhaps not more illuminating) :

$$ a_{2^p-1} = \frac{2-2^{-p}}{(2;2)_p}$$

where $(a;q)_n$ is the q-Pochhammer_symbol.

Some numerical values:

$ a_1=-1.50$, $ a_3\approx0.58333$, $a_7\approx-0.08929 $, $a_{15}\approx 0.00615$, $a_{31}\approx-0.0002$

So, finally

$$ P_n = 1 - \frac{3}{2} 2^{-n} + \frac{7}{12} 2^{-3 n } + \cdots =\\ = 1 +\sum_{p=1}^\infty \frac{2-2^{-p}}{(2;2)_p}2^{-(2^p-1)n}$$

This converges quite quickly, specially for big $n$.

To get, according to the original question, the probability that some run is greater or equal than $m$, we'd evaluate

$$1 - P_{m-1} = 3 \, 2^{-m} - \frac{14}{3} 2^{-3 m} + \cdots $$

which in the first approximation coincides with the OP's estimate (using the expectation); it is quite a good approximation for, say, $n>4$.

        n        1         2         3         4         5        6          7     
 p(n) exact   0.32222   0.63411   0.81364   0.90639   0.95314   0.97656   0.98828  
 p(n) approx  0.25000   0.62500   0.81250   0.90625   0.95312   0.97656   0.98828  

Added Rigurous bounds for $P(n)$ can be obtained by noting that, with my definition, it's equivalent to the probability that a sequence of iid geometric variables $x_i = 1, 2 \cdots$ attaining values in $x_1 \le n$,$x_2 \le n + x_1 $,$x_3 \le n + x_1 + x_2$ ...

So

$$P(n) = \sum_{k_1}^{n} 2^{-k_1} \sum_{k_2}^{n+k_1} 2^{-k_2} \sum_{k_3}^{n+k_1+k_2} 2^{-k_3} \cdots$$

In particular, a lower bound for the $m$-th terms in this infinite nested product is obtained by setting the outer values in $k_1=1$, $k_2=1$..., so

$$P(n) \le \sum_{k_1}^{n} 2^{-k_1} \sum_{k_2}^{n+1} 2^{-k_2} \sum_{k_3}^{n+2} 2^{-k_3} \cdots = (1 - 2^{-n})(1 - 2^{-n-1})(1 - 2^{-n-2}) = \\ = \prod_{i=0}^\infty (1 - 2^{-n} 2^{-i})= S(2^{-n},2^{-1}) $$ where $S(a,q)=\prod_{i=0}^\infty (1 - a q^{i})$ is the q-Pochamer symbol or (q-series) QPochhammer[a,q]

We can check that this tends to 1 when $a\to 0$ by using the bound $0<-\log(1-x)<2x$ for $0<x \le 1/2$, so $\log( S(2^{-n},2^{-1})$ is bounded between $0$ and $-2^{-n+1}$. Hence $\lim_{n\to\infty} P(n)=1$.

Added 2: Regarding uniqueness: We must prove that the recursion (1), with the boundary condition $P_{\infty}=1$ does not admit more than one solution (with $0 \le P_n \le 1)$. This is equivalent to prove that the recursion does not have a solution with $P_{\infty}=0$ except for the trivial one: $P_n=0$. 

Now, suppose $P_{n_1}>0$. Then, the equation (1) implies that there exists at least one $n_2$ in $n_1 +1 \cdots 2 n_1$ with $P_{n_2} > P_{n_1}$. Applying the same to $n_2$, this implies that we can find an increasing subsequence of $P(n)$; hence its limit cannot be zero.

Related Question