[Math] How many binary strings (with a given number of occurrences of 0 and 1) are there that do not contain a given substring

bit-stringscombinationscombinatoricspermutations

I know my binary string is composed of exactly $n$ $1$s and $m$ $0$s. How many such strings are possible, if we add the constraint that they must not contain a specific given substring $S$ (whose length is $\leq n+m$)?

I am specifically interested in the answer in the case that $S=010$.

Note: I know how to determine the answer programatically / via dynamic programming. I'm looking for a more closed form / combinatoric solution.

For example, if $n=3$, $m=2$, and $S=010$, then the following would be all $7$ relevant ways:
$$00111$$
$$01101$$
$$01110$$
$$10011$$
$$10110$$
$$11001$$
$$11100$$

Best Answer

This answer is based upon the Goulden-Jackson Cluster Method.

We consider the set words of length $n\geq 0$ built from an alphabet $$\mathcal{V}=\{0,1\}$$ and the set $B=\{010\}$ of bad words, which are not allowed to be part of the words we are looking for. We derive a generating function $f(s)$ with the coefficient of $s^n$ being the number of searched words of length $n$.

According to the paper (p.7) the generating function $f(s)$ is \begin{align*} f(s)=\frac{1}{1-ds-\text{weight}(\mathcal{C})}\tag{1} \end{align*} with $d=|\mathcal{V}|=2$, the size of the alphabet and $\mathcal{C}$ the weight-numerator of bad words with \begin{align*} \text{weight}(\mathcal{C})=\text{weight}(\mathcal{C}[010]) \end{align*}

We calculate according to the paper \begin{align*} \text{weight}(\mathcal{C}[010])&=-s^3-s^2\text{weight}(\mathcal{C}[010])\tag{1}\\ \end{align*}

and get \begin{align*} \text{weight}(\mathcal{C})=-\frac{s^3}{1+s^2} \end{align*}

It follows

\begin{align*} f(s)&=\frac{1}{1-ds-\text{weight}(\mathcal{C})}\\ &=\frac{1}{1-2s+\frac{s^3}{1+s^2}}\tag{2}\\ &=\frac{1+s^2}{1-2s+s^2-s^3}\\ &=1+2s+4s^2+7s^3+12s^4+\color{blue}{21}s^5\\ &\qquad+37s^6+65s^7+114s^8+200s^9+351s^{10}+\cdots \end{align*}

The last line was calculated with the help of Wolfram Alpha. The coefficient of $s^5$ shows there are $\color{blue}{21}$ valid words of length $5$ from the alphabet $\{0,1\}$ which do not contain the word $010$.

But we want to also keep track of the number of $0$'s and $1$'s. We get a refinement of $f(s)$ by marking the $0$ with $x$ and the $1$'s with $y$. We obtain from (1) \begin{align*} \text{weight}(\mathcal{C}[010])&=-x^2ys^3-xys^2\text{weight}(\mathcal{C}[010]) \end{align*} and get \begin{align*} \text{weight}(\mathcal{C})=-\frac{x^2ys^3}{1+xys^2} \end{align*}

Using this generalized weight we obtain from (2) a generating function $g(s;x,y)$ \begin{align*} g(s;x,y)&=\frac{1}{1-(x+y)s+\frac{x^2ys^3}{1+xys^2}}\\ &=\frac{1+xys^2}{1-(x+y)s+xys^2-xy^2s^3}\\ &=1+(x+y)s+(x^2+2xy+y^2)s^2+(x^3+2x^2y+3xy^2+y^3)\\ &\qquad+(x^4+2x^3y+4x^2y^2+4xy^3+y^4)s^4\\ &\qquad+(x^5+2x^4y+5x^3y^2+\color{red}{7}x^2y^3+5xy^4+y^5)s^5+\cdots \end{align*}

So, e.g. out of $2^5=32$ binary words of length $5$ there are $\color{blue}{21}$ valid words which do not contain $010$ and $\color{red}{7}$ of them contain $n=2$ zeros and $m=3$ ones:

\begin{array}{cccc} \qquad\color{blue}{00000}\qquad&\qquad01000\qquad&\qquad\color{blue}{10000}\qquad&\qquad\color{blue}{11000}\qquad\\ \color{blue}{00001}&01001&\color{blue}{10001}&\color{red}{11001}\\ 00010&01010&10010&11010\\ \color{blue}{00011}&01011&\color{red}{10011}&\color{blue}{11011}\\ 00100&\color{blue}{01100}&10100&\color{red}{11100}\\ 00101&\color{red}{01101}&10101&\color{blue}{11101}\\ \color{blue}{00110}&\color{red}{01110}&\color{red}{10110}&\color{blue}{11110}\\ \color{red}{00111}&\color{blue}{01111}&\color{blue}{10111}&\color{blue}{11111}\\ \end{array}