The number of subsequences that are not of a particular type

combinationscombinatorics

Consider that we're given a string of characters. We have to find the total number of subsequences of this string that do not follow the following properties:

  • Has $k$ consonants before a vowel and $k$ consonants after it. So, it has a length of $2k+1$.
  • $k \geq 1$

A subsequence here has to follow order, but may not necessarily contain all the characters of the string. For example:
$abcd$ has $abc$,$abd$,$acd$,$ab$,$ac$,$ad$,.. et cetera as it's subsequences.

So, a string of length $n$ can have a total of $2^n-1$ subsequences. Now, to find all the subsequences, we can find the number of consonants to the left of a vowel (say $a$) and to the right of it (say $b$). So, when:
$k=1$, $^aC_1 \times ^bC_1$, to select the single consonant
$k=2$, $^aC_2 \times ^bC_2$, to select the two consonants
$k=3$, $^aC_3 \times ^bC_3$, to select the three consonants
.
.
.
$k=minimum(b,a)$, $^aC_k \times ^bC_k$, to select the $k$ consonants, from both the sides.

We can sum these to get the answer for this vowel, and then add such answers for all the vowels, and subtract that from $2^n-1$, for the main answer.

My question is, does the summation of this sequence for $k=1,2,…$ has a different form that can be calculated in $O(1)$ time?

Best Answer

The sum of $^aC_k \times ^bC_k$, or in more modern notation of $\binom ak \binom bk$, for $k = 1, \dots, \min\{a,b\}$ is actually just $$ \sum_{k \ge 1} \binom ak \binom bk = \binom{a+b}{a}-1. $$ It's easier to take the sum starting from $k=0$, in which case the answer is $\binom{a+b}{a}$.

To prove this, let $S$ be a set of $a$ consonants chosen out of the $a+b$ consonants total surrounding the given vowel. (There are $\binom{a+b}{a}$ such sets.) Suppose $S$ includes $j$ consonants left of the vowel, and $a-j$ consonants right of the vowel. Then picking the $a-j$ consonants left of the vowel which aren't in $S$, and the $a-j$ consonants right of the vowel which are in $S$, gives us a subsequence with $k = a-j$ consonants before the vowel and $k$ after it.

Conversely, any subsequence of the kind you're counting can be obtained in this way from some $S$.

You're going to have some trouble with repeated characters, though. In the string xxxyzzzz (assuming y is a vowel), there are $\binom73-1$ ways to choose a bad subsequence around y, by this formula. However, only $3$ of them are unique: xyz, xxyzz, and xxxyzzz. For that matter, there are no longer $2^n-1$ subsequences in this case, either.