For the particular example $M=2$ and $R=(10 | 1)^\ast 1^\ast$, I suppose if we first convert it to a CFG then there appears to be a general formula that involves convolutions. For example:
- $A \rightarrow \epsilon | 0A | 1A$
- $C \rightarrow \epsilon | 10C | 0C$
- $D \rightarrow \epsilon | 1D$
- $E \rightarrow ACDA$.
Then to get a string of length $n$, concatenate a string from $A$ of length $k_1$, a string from $C$ of length $k_2$, a string from $D$ of length $k_3$ and a string of length $A$ of length $k_4$, then you get
$N_E(n) = \sum_{k_1+k_2+k_3+k_4 = n} N_A(k_1)N_C(k_2)N_D(k_3) N_A(k_4)$
where $0 \leq k_i \leq n$. Ok, so there was no need to use a CFG, we could arrive here directly by identifying the parts!
Note $N_A(k_1) = 2^{k_1}$, $N_D(k_3) = 1$, and $N_E(k_4) = 2^{k_4}$.
$N_C(k_2) = N_C(k_2-1) + N_C(k_2-2)$, Fibonacci recurrence with $N_C(1) = 1$ and $N_C(2) = 2$.
$N_{E,1}(n) = \sum_{k_1+k_2+k_3+k_4=n} 2^{k_1+k_4} F(k_3) = \sum_{a+b \leq n} 2^{a} F(b) = \sum_{b\leq n} (2^{n-b+1}-1) F(b)$.
Looks like this can be attacked with generating functions.
This approach seems to generalize to other regular expressions. Oh and you said you know how to do it for exact matching. So for substring, just add an $A$ before and an $A$ after :) ?
Oh drats... but then you might be double-counting. You'd have to use inclusion-exclusion to subtract the number of strings where the pattern appears at least twice, add back where it appears at least three times, etc. But maybe it is okay.
For $ACDACDA$ (string appears at least twice):
$N_{E,2} = \sum_{k_1+k_2+k_3+k_4+k_5+k_6+k_7=n} = N_A(k_1)N_C(k_2)N_A(k_4)N_C(k_5)N_A(k_7) = \sum_{a+b+c<n}(2^a-1) F(b)F(c)$.
Look for a general pattern?
$N_E(n) = N_{E,1}(n) - N_{E,2}(n) + N_{E,3}(n) + \ldots + N_{E,n}(n)$.
I didn't realize you had a generating function for strings matching $R$. Let us assume this now. Given the generating function $G_R(x)$ that contains number of strings of length $n$ matching $R$, let us figure out some way to obtain the generating function encoding the number of strings of length $n$ containing a substring that matches $R$.
So first, we do the same trick above and pad the sides with arbitrary strings. So the number of strings of length $n$ containing at least 1 substring matching $R$ is the sum $\sum_{a+b \leq n} M^a N_R(b) = \sum_{b \leq n} \frac{M^{n-b+1}-1}{M-1} N_R(b)$.
In general, the number of strings of length $n$ containing at least k substrings matching $R$ is the sum $\sum_{b_1+\ldots+b_k} \frac{M^{n-(b_1+\ldots+b_k)+1}-1}{M-1} N_R(b_1)\cdots N_R(b_k)$.
As a generating function, let $H_k(x)$ correspond to containing at least $k$ substrings.
$H_k(x) = \sum_{j} N_{R,k}(j) x^j = \sum_{j} \sum_{a+b_1+\ldots+b_k = j} C(j-a) x^{j-a} N_R(b_1) x^{b_1} \cdots N_R(b_k) x^{b_k}$
This is a $k$-fold convolution between the generating function of all $M$-strings, and $k-1$ copies of $G_R$.
Now we encode inclusion exclusion:
$H(x) = \sum_{j} \sum_{i=1}^j (-1)^{i-1} N_{R,\ i}(j) x^j = \sum_i (-1)^{i-1} \sum_{j \geq i} N_{R,\ i}(j) x^j $, and this is just the alternating sum of the corresponding partial generating functions. So... I'm going to stop here for the moment.
Best Answer
We need $7$ non-terminals. $S$ is the initial state, $T,\ U,$ and $V$ all indicate that an odd number of characters have been read, and that the last character read was $a$, $b$, or $c$, respectively. Similarly, $X,\ Y$, and $Z$ indicate that an even number of characters have been read, and that the last character read was $a$, $b$, or $c$, respectively. I use $\lambda$ for the empty string. $S,\ X,\ Y$, and $Z$ are accepting states, so we have the productions $$S\to\lambda\\ X\to\lambda\\ Y\to\lambda\\ Z\to\lambda\\ $$ Also we, have $$S\to aT\\S\to bU\\S\to cV\\T\to aX\\T\to cZ\\U\to bY\\U\to cZ\\V\to aX\\V\to bY\\V\to cZ\\$$
I trust you see how I arrived at these. If we get a $b$ in state $T$ or an $a$ in state $U$ then we've seen a forbidden substring, so there are no productions corresponding to these possibilities.
It remains to determine the productions with $X,\ Y$ or $z$ on the left-hand side. I leave it to you to supply those.
EDIT Reading this over, I see that I've slipped into the language of automata at times. I hope it's obvious what I mean. If not, ping me.
EDIT
When I wrote this, I though you were mainly concerned with whether or not the language is regular, but it's apparent from your comments that you're mostly concerned with getting a regex. Remember that the complement of a regular language is regular. So we want the complement of the union of
So, if you can make regexes for these three languages, you can combine them to construct the regex you seek.
EDIT
The previous edit is wrong. The discussion prior to theorem $4.5$ in Hopcroft, Motwani and Ullman says that you can't do it that way. You have to convert the regex to a DFA, construct a DFA that recognizes the complement, and then convert the new DFA to a regex. I think any regex for this language will be exceedingly long and complicated.