For the particular example $M=2$ and $R=(10 | 1)^\ast 1^\ast$, I suppose if we first convert it to a CFG then there appears to be a general formula that involves convolutions. For example:
- $A \rightarrow \epsilon | 0A | 1A$
- $C \rightarrow \epsilon | 10C | 0C$
- $D \rightarrow \epsilon | 1D$
- $E \rightarrow ACDA$.
Then to get a string of length $n$, concatenate a string from $A$ of length $k_1$, a string from $C$ of length $k_2$, a string from $D$ of length $k_3$ and a string of length $A$ of length $k_4$, then you get
$N_E(n) = \sum_{k_1+k_2+k_3+k_4 = n} N_A(k_1)N_C(k_2)N_D(k_3) N_A(k_4)$
where $0 \leq k_i \leq n$. Ok, so there was no need to use a CFG, we could arrive here directly by identifying the parts!
Note $N_A(k_1) = 2^{k_1}$, $N_D(k_3) = 1$, and $N_E(k_4) = 2^{k_4}$.
$N_C(k_2) = N_C(k_2-1) + N_C(k_2-2)$, Fibonacci recurrence with $N_C(1) = 1$ and $N_C(2) = 2$.
$N_{E,1}(n) = \sum_{k_1+k_2+k_3+k_4=n} 2^{k_1+k_4} F(k_3) = \sum_{a+b \leq n} 2^{a} F(b) = \sum_{b\leq n} (2^{n-b+1}-1) F(b)$.
Looks like this can be attacked with generating functions.
This approach seems to generalize to other regular expressions. Oh and you said you know how to do it for exact matching. So for substring, just add an $A$ before and an $A$ after :) ?
Oh drats... but then you might be double-counting. You'd have to use inclusion-exclusion to subtract the number of strings where the pattern appears at least twice, add back where it appears at least three times, etc. But maybe it is okay.
For $ACDACDA$ (string appears at least twice):
$N_{E,2} = \sum_{k_1+k_2+k_3+k_4+k_5+k_6+k_7=n} = N_A(k_1)N_C(k_2)N_A(k_4)N_C(k_5)N_A(k_7) = \sum_{a+b+c<n}(2^a-1) F(b)F(c)$.
Look for a general pattern?
$N_E(n) = N_{E,1}(n) - N_{E,2}(n) + N_{E,3}(n) + \ldots + N_{E,n}(n)$.
I didn't realize you had a generating function for strings matching $R$. Let us assume this now. Given the generating function $G_R(x)$ that contains number of strings of length $n$ matching $R$, let us figure out some way to obtain the generating function encoding the number of strings of length $n$ containing a substring that matches $R$.
So first, we do the same trick above and pad the sides with arbitrary strings. So the number of strings of length $n$ containing at least 1 substring matching $R$ is the sum $\sum_{a+b \leq n} M^a N_R(b) = \sum_{b \leq n} \frac{M^{n-b+1}-1}{M-1} N_R(b)$.
In general, the number of strings of length $n$ containing at least k substrings matching $R$ is the sum $\sum_{b_1+\ldots+b_k} \frac{M^{n-(b_1+\ldots+b_k)+1}-1}{M-1} N_R(b_1)\cdots N_R(b_k)$.
As a generating function, let $H_k(x)$ correspond to containing at least $k$ substrings.
$H_k(x) = \sum_{j} N_{R,k}(j) x^j = \sum_{j} \sum_{a+b_1+\ldots+b_k = j} C(j-a) x^{j-a} N_R(b_1) x^{b_1} \cdots N_R(b_k) x^{b_k}$
This is a $k$-fold convolution between the generating function of all $M$-strings, and $k-1$ copies of $G_R$.
Now we encode inclusion exclusion:
$H(x) = \sum_{j} \sum_{i=1}^j (-1)^{i-1} N_{R,\ i}(j) x^j = \sum_i (-1)^{i-1} \sum_{j \geq i} N_{R,\ i}(j) x^j $, and this is just the alternating sum of the corresponding partial generating functions. So... I'm going to stop here for the moment.
As Farnight and Andre noted, the number of sequences which don't contain consecutive zeroes will turn out to be Fibonacci numbers. Here is an alternate approach which yields a nice combinatorial identity for Fibonacci numbers.
Suppose we are looking at length $n$ binary sequences of $0$'s and $1$'s. Instead of thinking of it as $0$'s and $1$'s however, imagine it as $[\color{blue}{01}]$'s and $\color{red}{1}$'s with a possible $0$ at the far right. In other words, imagine we glued a 1 to the right of every 0 in order to forcibly avoid having two 0's in a row.
Either the furthest right entry in the sequence will be a 1 or a 0.
- First case: Furthest right entry is a 1 (either a part of a $\color{blue}{01}$ or a $\color{red}{1}$), E.g. $\underline{\color{blue}{01}}~\underline{\color{red}{1}}~\underline{\color{red}{1}}~\underline{\color{blue}{01}}~\underline{\color{blue}{01}}$
There can be anywhere from zero to $\lfloor\frac{n}{2}\rfloor$ number of $[\color{blue}{01}]$'s. Let $k$ be the number of $[\color{blue}{01}]$'s. Then there are $n-2k$ copies of $\color{red}{1}$'s to place. Thus, for a specific $k$, we have a total of $k+n-2k=n-k$ spaces in which we either place a $\color{blue}{01}$ or a $\color{red}{1}$ for a total of $\binom{n-k}{k}$. Ranging over all possible $k$, we find that there are $\sum\limits_{k=0}^{\lfloor\frac{n}{2}\rfloor}\binom{n-k}{k}$ possible sequences in the first case.
- Second case: Furthest right entry is a $0$. E.g. $\underline{\color{red}{1}}~\underline{\color{blue}{01}}~\underline{\color{red}{1}}~\underline{\color{red}{1}}~\underline{\color{red}{1}}~\underline{\color{red}{1}}~\underline{0}$
There can be anywhere from zero to $\lfloor\frac{n-1}{2}\rfloor$ number of $[\color{blue}{01}]$'s. Let $k$ again be the total number of $[\color{blue}{01}]$'s. Then there are $n-1-2k$ copies of $\color{red}{1}$'s to place. Thus for a specific $k$, we have a total of $k+n-1-2k=n-1-k$ spaces in which we either place a $\color{blue}{01}$ or a $\color{red}{1}$ for a total of $\binom{n-1-k}{k}$ possibilities. Ranging over all $k$, we find there are $\sum\limits_{k=0}^{\lfloor\frac{n-1}{2}\rfloor}\binom{n-1-k}{k}$ sequences in the second case.
Thus combining the cases, we have a total of $\sum\limits_{k=0}^{\lfloor\frac{n}{2}\rfloor}\binom{n-k}{k}+\sum\limits_{k=0}^{\lfloor\frac{n-1}{2}\rfloor}\binom{n-1-k}{k}$.
Noting that this answers the same question in a different form than the other answers, it gives us the identity for $n\geq 0$:
$F_{n+2} = \sum\limits_{k=0}^{\lfloor\frac{n}{2}\rfloor}\binom{n-k}{k}+\sum\limits_{k=0}^{\lfloor\frac{n-1}{2}\rfloor}\binom{n-1-k}{k}$
For your specific example of length 8 binary sequences, my form gives:
$$\sum\limits_{k=0}^{\lfloor\frac{8}{2}\rfloor}\binom{8-k}{k}+\sum\limits_{k=0}^{\lfloor\frac{8-1}{2}\rfloor}\binom{8-1-k}{k}\\
=\sum\limits_{k=0}^{4}\binom{8-k}{k}+\sum\limits_{k=0}^{3}\binom{7-k}{k}\\
=\binom{8}{0}+\binom{7}{1}+\binom{6}{2}+\binom{5}{3}+\binom{4}{4}+\binom{7}{0}+\binom{6}{1}+\binom{5}{2}+\binom{4}{3}\\
=1+7+15+10+1+1+6+10+4= 55$$
and $55$ is the $10^{th}$ Fibonacci number.
(note also, I count the Fibonacci sequence with $F_0=0, F_1=1, F_2=1,F_3=2,\dots$ whereas it seems Farnight above me counts $F_0=1,F_1=1,F_2=2,\dots$.)
A much cleaner approach which yields the same answer:
In a binary sequence of length $n$ without, there will be between $0$ and $\lfloor\frac{n}{2}\rfloor$ zeroes. Supposing that there are $k$ zeroes, then there are $n-k$ ones. Set up $n-k$ ones on a line. There are then $n-k+1$ spaces between or to the far left or right of the ones in which a zero may be placed. Pick $k$ of them to be occupied with a zero.
Ranging over all values of $k$, you get $\sum\limits_{k=0}^{\lfloor \frac{n}{2}\rfloor} \binom{n-k+1}{2}$, which again equals the same as the other answers. For $n=8$ you have $\sum\limits_{k=0}^4\binom{9-k}{k}=\binom{9}{0}+\binom{8}{1}+\binom{7}{2}+\binom{6}{3}+\binom{5}{4} = 1+8+21+20+5 = 55$
Of course, to complete the problem, note the numbers and formulae we came up with were counting how many sequences did not have consecutive zeroes. To count how many sequences do have consecutive zeroes, apply inclusion-exclusion. There are $2^n$ different length-$n$ binary sequences, so there are $2^n-F_{n+2}$ sequences which do have consecutive zeroes (or depending on how you define the starting values for the Fibonacci sequence, $2^n-F_{n+1}$). In our current example, there are then $2^8-55=256-55=201$.
To find the probability, take that number and divide by the size of the sample space, in our case $2^n$. Hence the probability for our example is $\frac{201}{256}$.
Best Answer
$\textbf{Hint:}$The first question can be answered by simply counting. For the second question note that every bitstring without $00$ ends in either $1$ or in $10$. How many possibilities are there in either case? The third question can be answered by combining the results of the first two questions.