General formula for counting ordered sequences with duplicates

combinatoricspermutations

Suppose I have a list of distinct allowed elements $\{x_1, x_2, …, x_n\}$. Each element $x_j$ can be repeated $i_j$ times.

How many ordered sequences of length $m<\sum_{j=1}^n i_j$ are possible?

For example, suppose the alphabet is ${A,B,C}$; A has 3 copies, B has 1, and C has 3. Then, choosing from $${A,A,A,B,C,C,C},$$ how many ordered sequences of length $m =3$ are possible?

(Hence, e.g., $A,A,B$ is distinct from $A,B,A$)

If $m=\sum_{j=1}^n i_j$, this is straightforward: $$|\text{sequences}| = \frac{(\sum_{j=1}^n i_j)!}{\prod_{j=1}^n i_j!}$$

But what if $m<\sum_{j=1}^n i_j$?

Best Answer

You can use a recursive definition.

$$ f(m,i_1,i_2,\dots,i_n)=\begin{cases} 1 &\text{if }m = 0 \\ \displaystyle\sum_{j=1}^n\left(\begin{cases} 0 &\text{if }i_j=0 \\ f(m-1,i_1,i_2,\dots,i_j-1,\dots,i_n) &\text{if }i_j>0 \end{cases}\right) &\text{if }m > 0 \end{cases} $$

In words: There is only one sequence of length zero, namely the empty sequence. For a non-empty sequence, you can pick any symbol $j$ as starting point if its occurrence count is non-zero. And then you follow that with a sequence that is one symbol shorter and where the occurrence count of the symbol you picked is also reduced by one.

No, this is not beautiful. Things tend to be simpler if occurrence counts are at most one (permutation with no duplicates), or sum of occurrences matches overall length (permutation with duplicates but all elements taken) or occurrence counts are infinite (just pick $m$ symbols independently from given alphabet). But this kind of mixed case you have isn't very amenable to simplifications.

The above definition would allow for an evaluation of the formula using some dynamic programming, so you might be able to re-use some previously computed values and thus avoid a computation time proportional to the final count.

Related Question