Expected length of a run in a finite number of coin flips

combinatoricsexpected valueprobability

Consider a sequence of fair coin flips. A run of length $k$ is $k$ consecutive flips that are all the same. (e.g. HTTHH has a run of length 1 and two runs of length 2). Define the random variable $R_n$ as the result of the following process: generate a sequence of $n$ fair coin flips, list the lengths of its runs, and then sample from this list uniformly at random. I am trying to find $\mathbb{E}[R_n]$.

From trying small cases, the answer seems to be $2-\frac{1}{2^{n-1}}$:

  • If $n=1$, there is always one run of length 1
  • If $n=2$, there are two runs of length 1 half of the time, and one run of length two half of the time, for an average of $\frac{1}{2}\cdot 1+\frac{1}{2}\cdot 2=\frac{3}{2}$
  • If $n=3$, you get AAA format $\frac{1}{4}$ of the time, AAB or BAA format $\frac{1}{2}$ of the time, and ABA format $\frac{1}{4}$ of the time, giving $\frac{1}{4}\cdot 3 + \frac{1}{2}\cdot \frac{3}{2} + \frac{1}{4}\cdot 1 = \frac{7}{4}$.
  • If $n=4$, you get a run of length 4 with probability $\frac{1}{8}$, runs with lengths 3 and 1 with probability $\frac{1}{4}$, runs with length 2 and 2 with probability $\frac{1}{8}$, runs with lengths 1,1,2 with probability $\frac{3}{8}$, and runs with lengths 1,1,1,1 with probability $\frac{1}{8}$. This gives $\frac{1}{8}\cdot 4+\frac{1}{4}\cdot 2+\frac{1}{8}\cdot 2+\frac{3}{8}\cdot \frac{4}{3} + \frac{1}{8}\cdot 1 = \frac{15}{8}$.

This also makes sense because intuitively, the distribution of $R_n$ should be approximated by a geometric RV of probability $\frac{1}{2}$, which has EV 2. The difference is that runs are necessarily finite, while a geometric RV can be infinite. This means $\mathbb{E}[R_n]$ must be less than $2$ and approaching $2$ for large $n$.

In fact, if we define $G_n$ to be a 'truncated' geometric RV which has distribution $\mathbb{P}[G_n=i] = 2^{-i}$ for $1\leq i\leq n-1$ and $\mathbb{P}[G_n=n]=2^{-n+1}$, then $\mathbb{E}[G_n] = 2-\frac{1}{2^{n-1}}$, which looks promising (and possibly there is even a direct combinatorial argument that $\mathbb{E}[G_n]=\mathbb{E}[R_n]$). However I can't find a way to use this to find $\mathbb{E}[R_n]$.

Best Answer

Approach 1

In the same random experiment, let $\mathbf X_n$ be the number of runs. Each coinflip after the first independently starts a new run with probability $\frac12$, so we have $\Pr[\mathbf X_n = k] = \binom{n-1}{k-1} (\frac12)^{n-1}$. (In other words, $\mathbf X_n - 1$ is $\text{Binomial}(n-1, \frac12)$.)

We have $\mathbb E[\mathbf R_n \mid \mathbf X_n = k] = \frac nk$. If there are $k$ runs, then (no matter how they are split up) the expected length of a randomly picked run is going to be equal to the average length of a run.

Therefore $\mathbb E[\mathbf R_n] = \mathbb E[\frac{n}{\mathbf X_n}]$, which we can find by definition of expected value: $$ \mathbb E\left[\frac{n}{\mathbf X_n}\right] = \sum_{k=1}^n \frac nk \binom{n-1}{k-1} \left(\frac12\right)^{n-1} = \sum_{k=1}^n \binom nk \left(\frac12\right)^{n-1} = \frac{2^n - 1}{2^{n-1}} = 2 - \frac1{2^{n-1}}. $$

Approach 2

We can prove that actually, the distribution of a randomly chosen run is the same as the distribution of the first run, which is precisely the truncated geometric $\mathbf G_n$. In particular, $\mathbb E[\mathbf R_n] = \mathbb E[\mathbf G_n] = 2 - \frac1{2^{n-1}}$.

Say that two outcomes are equivalent if their run patterns are the same up to cyclic permutation. For example, HHTHHHTT has run pattern $(2,1,3,2)$ and TTTHHTTH has run pattern $(3,2,2,1)$, so they are equivalent.

Within an equivalence class of outcomes, the distribution of a random run is the same as the distribution of the first run, because choosing a random run is the same as choosing the first run in a random cyclic permutation of the run pattern. So $\mathbf R_n$ and $\mathbf G_n$ have the same distribution on each equivalence class, which means they have the same distribution overall.