[Math] the number of binary strings of length N with exactly R runs of ones, with C total ones

binarybit-stringscombinatoricspolya-urn-modelprobability

I'm concerned with the total number of ones, and the total number of runs, but not with the size of any of the runs.

For example, $N=8$, $R=3$, $C=5$ includes 11101010, 01101011 among the 24 total possible strings.

I can compute these for small $N$ easily enough, but I am specifically interested in the distribution for $N=65536$. As this will result in very large integers, the log probability distribution is equally useful.

I found [1] and [2], which includes this:

Let $N_{n;g_k,s_k}$ denote the number of binary strings which contain for given $g_k$ and $s_k$, $g_k=0,1,…,⌊\frac{s_k}{k}⌋$, $s_k=0,k,k+1,…,n$, exactly $g_k$ runs of 1’s of length at least $k$ with total number of 1’s (with sum of lengths of runs of 1’s) exactly equal to $s_k$ in all possible binary strings of length $n$.

An expression for this is given in eq. (24):

$N_{n;g_k,s_k} = \sum_{y=0}^{n-s_k}
{y+1 \choose g_k }
{s_k-(k-1)g_k-1 \choose g_k-1}
\sum_{j=0}^{⌊\frac{n-y-s_k}{k}⌋}
(-1)^j
{y+1-g_k \choose j}
{n-s_k-kj-g_k \choose n-s_k-kj-y}
$

for $g_k \in \{1, …, ⌊\frac{s_k}{k}⌋\}$, $s_k \in \{k, k+1, …, n\}$.

I think this is exactly what I'm looking for, with $k = 1$, $s_k = C$ and $g_k = R$. However, when I implemented this I did not get the expected results (Python shown below, edge cases omitted), based on comparing to counting all strings for N=8. I am working backwards to try to understand where I might have gone wrong, but not having much luck yet. I wonder if I am misunderstanding the result.

def F(x, y, n):
    # x = C or s_k (cardinality)
    # y = R or g_k (runCount)
    # n = N (total bits)

    a1 = 0
    for z in range(n-x+1):
        b1 = choose(z+1, y) * choose(x-1, y-1)
        a2 = 0
        for j in range(n-z-x+1):
            a2 += (-1) ** j * choose(z+1-y, j) * choose(n-x-j-y, n-x-j-z)
        a1 += b1 * a2

    return a1

Note that the choose function uses factorial, which I realize won't work for larger $N$ – but should be fine for $N=8$.

Edit: corrected a sign error typo in eq. (24) and the equivalent error in the python code.

[1] Counting Runs of Ones and Ones in Runs of Ones in
Binary Strings, Frosso S. Makri, Zaharias M. Psillakis, Nikolaos Kollas https://file.scirp.org/pdf/OJAppS_2013011110241057.pdf

[2] On success runs of a fixed length in Bernoulli sequences: Exact and asymptotic results, Frosso S.Makria, Zaharias M.Psillakis http://www.sciencedirect.com/science/article/pii/S0898122110009284

Best Answer

runs_ones_1

Consider a binary string and let's put an additional (dummy) fixed $0$ at the start of the it. We individuate as a run a $0$ followed by contiguous $1$'s.

So, given a string of length $N$, total of ones $C$, and number of runs $R$, we will have $N-C$ zeros, of which $N-C-R+1$ are "free", that is not tight to mark the runs.

The number of ways to constitute the runs is the number of (standard) compositions of $C$ into $R$ parts, that is $$ {{C-1} \choose {R-1}}$$.
The number of ways to place the "free" zeros will be equal to the number of weak compositions of their number into $R+1$ parts (in front of the first and then after each run), i.e.: $$ {{N-C-R+1+R+1-1} \choose {R}}={{N-C+1} \choose {R}}$$.

Which confirms N.Shales's answer, so that the merit should go to him.

Addendum

Concerning formula (24), for what I can see, it looks like that in the 3rd binomial ${{y+1+g_k} \choose {j}}$ there is a sign typo, and that it should be $\cdots -g_k$.
Then putting for instance $k=1,\; s_k=n-1 \; g_k=2$ it will correctly give $n-2$,
and with $k=1,\; s_k=C \; g_k=R$ it will give the formula above.
But not having the full text, I cannot check that further.