[Math] Expected Value of the Maximum Number of Heads in n Flips

combinatoricsexpectationprobability

How would one go about finding the expected value of the maximum number of consecutive heads when flipping a coin $n$ times? For small $n$, it seems easy to brute-force it (i.e. when $n = 3$, the sample space is $\{HHH, HHT, HTH, HTT, TTT, TTH,THT,THH\}$ and so the maximum number of consecutive heads is $\{3,2,1,1,0,1,1,2\}$ so the expected value of the number of maximum consecutive heads should be $11/8$). However, for $n>5$, it becomes pretty hard to brute force this.

For a research project, I am really wondering how the solution to this problem behaves for $ 50 \leq n \leq 100 $. In other words, if I flip $50$ coins, what is the maximum run length of heads that should be expected?

Any advice on how to solve this problem for either the $n$ in the above bound, or for $n$ in general?

Best Answer

Suppose we compute the generating function of binary strings having at most $q$ consecutive heads. There are four cases, according to whether the string starts with heads or tails and ends with heads or tails.

We get $$G_{HH}(z) = z\frac{1-z^{q}}{1-z} \sum_{k=0}^\infty \left(\frac{z}{1-z}z\frac{1-z^{q}}{1-z}\right)^k.$$

Continuing we get $$G_{HT}(z) = G_{HH}(z) \frac{z}{1-z}.$$

Furthermore $$G_{TT}(z) = \frac{z}{1-z} \sum_{k=0}^\infty \left(z\frac{1-z^{q}}{1-z}\frac{z}{1-z}\right)^k.$$

Finally we have $$G_{TH}(z) = G_{TT}(z) z\frac{1-z^{q}}{1-z}.$$

The sum term is $$\frac{1}{1-z^2(1-z^q)/(1-z)^2} = \frac{1-2z+z^2}{1-2z+z^2-z^2(1-z^q)} = \frac{1-2z+z^2}{1-2z+z^{q+2}}.$$

The factor on this is $$z\frac{1-z^{q}}{1-z} \left(1+\frac{z}{1-z}\right) + \frac{z}{1-z} \left(1+z\frac{1-z^{q}}{1-z}\right)$$ which is $$z\frac{1-z^{q}}{(1-z)^2} + \frac{z}{(1-z)^2} (1-z^{q+1}) = \frac{2z-z^{q+1}-z^{q+2}}{(1-z)^2}.$$

Multiplying we obtain the generating function $$G_q(z) = \frac{2z-z^{q+1}-z^{q+2}}{1-2z+z^{q+2}}.$$

It follows that the expectation times $2^n$ is given by

$$ [z^n] \left(0\times G_0(z) + \sum_{q=1}^n q (G_q(z)-G_{q-1}(z)) \right).$$

The sum simplifies to $$\sum_{q=1}^n q G_q(z) - \sum_{q=0}^{n-1} (q+1) G_q(z) = \sum_{q=0}^n q G_q(z) - \sum_{q=0}^{n-1} (q+1) G_q(z) \\ = n G_n(z) - \sum_{q=0}^{n-1} G_q(z).$$

and hence the expectation is $$\frac{1}{2^n} [z^n] \left( n G_n(z) - \sum_{q=0}^{n-1} G_q(z) \right).$$

This gives the sequence $$1/2,1,{\frac {11}{8}},{\frac {27}{16}},{\frac {31}{16}},{ \frac {69}{32}},{\frac {75}{32}},{\frac {643}{256}},{\frac { 1363}{512}},{\frac {1433}{512}},\ldots$$

Multiplying by $2^n$ we obtain $$1, 4, 11, 27, 62, 138, 300, 643, 1363, 2866, \ldots$$ which is OEIS A119706 where the above computation is confirmed.

The following Maple code can be used to explore these generating functions. The procedure v computes the generating function of the maximal run length of a string of $n$ bits by total enumeration. The procedure w computes it from the generating function $G_q(z).$

v :=
proc(n)
    option remember;
    local gf, k, d, mxrun, len;

    gf := 0;

    for k from 2^n to 2^(n+1)-1 do
        d := convert(k, base, 2);

        mxrun := 0;
        for pos to n do
            if d[pos] = 1 then
                len := 1;

                pos := pos+1;
                while pos <= n do
                    if d[pos] = 1 then
                        len := len+1;
                        pos := pos+1;
                    else
                        break;
                    fi;
                od;

                if len>mxrun then
                    mxrun := len;
                fi;
            fi;
        od;

        gf := gf  + z^mxrun;
    od;

    gf;
end;

G := q -> (2*z-z^(q+1)-z^(q+2))/(1-2*z+z^(q+2));

w :=
proc(n)
    option remember;
    local gf, mxrun;

    gf := 1;

    for mxrun to n do
        gf := gf +
        coeftayl(G(mxrun)-G(mxrun-1), z=0, n)*z^mxrun;
    od;

    gf;
end;

X := n -> coeftayl(n*G(n)-add(G(q), q=0..n-1), z=0, n)/2^n;

Here are two examples.

> v(4);
                        4      3      2
                       z  + 2 z  + 5 z  + 7 z + 1

> w(4);
                        4      3      2
                       z  + 2 z  + 5 z  + 7 z + 1

Addendum. Responding to the question of the OP, the maximum run length distribution for $n=50$ is

> w(50);
 50      49      48       47       46       45        44        43        42
z   + 2 z   + 5 z   + 12 z   + 28 z   + 64 z   + 144 z   + 320 z   + 704 z

             41         40         39          38          37          36
     + 1536 z   + 3328 z   + 7168 z   + 15360 z   + 32768 z   + 69632 z

               35           34           33            32            31
     + 147456 z   + 311296 z   + 655360 z   + 1376256 z   + 2883584 z

                30             29             28             27              26
     + 6029312 z   + 12582912 z   + 26214400 z   + 54525952 z   + 113246208 z

                  25              24               23               22
     + 234881024 z   + 486539259 z   + 1006632909 z   + 2080374408 z

                   21               20                19                18
     + 4294964912 z   + 8858356224 z   + 18253535488 z   + 37580568576 z

                    17                 16                 15                 14
     + 77307408384 z   + 158903894017 z   + 326369607799 z   + 669786836360 z

                      13                  12                  11
     + 1373319005440 z   + 2812533538048 z   + 5749650288420 z

                       10                   9                   8
     + 11716183298140 z   + 23723022576779 z  + 47402584528885 z

                       7                    6                    5
     + 92066138963408 z  + 168050756947888 z  + 267156803852044 z

                        4                    3                   2
     + 310228979841119 z  + 174887581402185 z  + 19394019617001 z

     + 32951280098 z + 1
Related Question