Are these permutations, combinations, or something in-between? How many are they? How can they be enumerated

combinatorics

Taking this post as starting point:

Number of permutations of AABBBCC, taking 7 letters at a time, when repititions are allowed

My case is slightly different, in the sense that, if my letters are, say, AABBC, the non-redundant permutations are of course $30 = \frac {5!} {2! 2! 1!}$:

AABBC
AABCB
AACBB

CBBAA

however, for my purposes, these 2 permutations are equivalent:

AABBC
BBAAC

and so are these:

CAABB
CBBAA

etc.

i.e. I can swap any groups of letters that appear the same number of times.

This is because I am interested in finding all the possible ways I can form ordered sets of groups of objects, where the identity of the objects themselves however does not matter: it only matters that they are the same.

So, to be clear, if I had AAB, all $3 = \frac {3!} {2! 1!}$ non-redundant permutations would be considered distinct:

AAB
ABA
BAA

Instead, if I had AABB, the $6 = \frac {4!} {2! 2!}$ non-redundant permutations:

AABB
ABAB
ABBA
BAAB
BABA
BBAA

are for my purposes 2 by 2 'equivalent', because e.g. AABB and BBAA both indicate putting 2 identical objects first and second, and the other 2 identical objects third and fourth; ABAB and BABA both indicate putting 2 identical objects first and third, and the other 2 identical objects second and fourth. And so on.
I hope I am making sense.

What I do not know, and would please like to have some advice about, is:

  1. is there a 'formal' mathematical definition for this kind of arrangement of letters or objects, with he specific type of equivalence I described?
  2. how do I calculate how many I should expect in total, given the distinct letters and the number of times they must appear?
  3. how do I generate all of them, without generating first all non-redundant permutations and eliminating a posteriori the equivalent ones?

For point 2, my current guess is that I need to start from the frequencies of each letter, and make a summary of how many letters there are, by the number of times they appear (their frequency).
E.g. if A, B, C appear $2, 2, 1$ times respectively, as in my initial example, I have:

$freq = 2: 2 \ letters$
$freq = 1: 1 \ letter$

Then I divide by the counts of letters (not by the frequencies).

So for AABBC, $30 = \frac {5!} {2! 2! 1!} \cdot \frac 1 {2 \cdot 1} = 15$.

This seems about right. If I use package arrangements in R:

permutations(c("A","B","C"), freq = c(2,2,1))
      [,1] [,2] [,3] [,4] [,5]
 [1,] "A"  "A"  "B"  "B"  "C" 
 [2,] "A"  "A"  "B"  "C"  "B" 
 [3,] "A"  "A"  "C"  "B"  "B" 
 [4,] "A"  "B"  "A"  "B"  "C" 
 [5,] "A"  "B"  "A"  "C"  "B" 
 [6,] "A"  "B"  "B"  "A"  "C" 
 [7,] "A"  "B"  "B"  "C"  "A" 
 [8,] "A"  "B"  "C"  "A"  "B" 
 [9,] "A"  "B"  "C"  "B"  "A" 
[10,] "A"  "C"  "A"  "B"  "B" 
[11,] "A"  "C"  "B"  "A"  "B" 
[12,] "A"  "C"  "B"  "B"  "A" 
[13,] "B"  "A"  "A"  "B"  "C" 
[14,] "B"  "A"  "A"  "C"  "B" 
[15,] "B"  "A"  "B"  "A"  "C" 
[16,] "B"  "A"  "B"  "C"  "A" 
[17,] "B"  "A"  "C"  "A"  "B" 
[18,] "B"  "A"  "C"  "B"  "A" 
[19,] "B"  "B"  "A"  "A"  "C" 
[20,] "B"  "B"  "A"  "C"  "A" 
[21,] "B"  "B"  "C"  "A"  "A" 
[22,] "B"  "C"  "A"  "A"  "B" 
[23,] "B"  "C"  "A"  "B"  "A" 
[24,] "B"  "C"  "B"  "A"  "A" 
[25,] "C"  "A"  "A"  "B"  "B" 
[26,] "C"  "A"  "B"  "A"  "B" 
[27,] "C"  "A"  "B"  "B"  "A" 
[28,] "C"  "B"  "A"  "A"  "B" 
[29,] "C"  "B"  "A"  "B"  "A" 
[30,] "C"  "B"  "B"  "A"  "A" 

1 is equivalent to 19; 2 to 20; etc.

You see already from this example, I could not stop at the first lexicographically enumerated 15 cases, because e.g. 6 is equivalent to 13.

Any ideas?
Am I perhaps missing something fundamental, could I use letters differently, transform…?

Thanks!

Best Answer

Thinking about it, what you're counting is actually set partitions with specified block sizes. You have the additional requirement that the blocks have specific labels (the letters). You can generate them with the general strategy given in this answer from stackoverflow. You'll need to tweak the conditions for when a prefix is viable. The most direct specialization of the algorithm would generate the following list for type $1^12^1$:

AAB,
ABA,
ABB,

corresponding to the partitions $\{\{1,2\},\{3\}\}$, $\{\{1,3\},\{2\}\}$, $\{\{1\},\{2,3\}\}$. This may be acceptable to you depending on the application, but it doens't keep the number of each letter constant. But you can also tweak the algorithm to generate

AAB,
ABA,
BAA,

which matches your question. If you want it like this, then say your string contains for example three equal sizes groups "AABBCC". One of the requirements for a viable increment should be that the first letters of group should appear in increasing order. So e.g. $\mathtt{\underline A\underline BB\underline CAC}$ is allowed, but $\mathtt{\underline AA\underline C\underline BBC}$ is not.


As for counting, you are on the right track. You need to count the number of groups of each size. For example, let's say that "ABCCDDDEEE" has type $1^2 2^1 3^2$, meaning $2$ groups of $1$ letter, $1$ group of $2$ letters, and $2$ groups of $3$ letters. Then a string of length $n$ with type $k_1^{e_1}\ldots k_m^{e_m}$ will have $$ \frac{n!}{(k_1!)^{e_1}\cdots (k_m!)^{e_m}\cdot(e_1!)\cdots (e_m!)} $$ non-equivalent permutations in your sense. The factors $(e_i)!$ are the only thing new, and they take care of the ways the groups can be permuted among each other.

So for example, take "AABB". It's got type $2^2$, so we get $$ \frac{4!}{(2!)^2 2!} = 3. $$ These are represented by "AABB", "ABAB" and "ABBA".


Your numbers can be found at http://oeis.org/A178867.

Inspired by Bulbasaur, I'll add that the exponential multivariate generating function for your numbers is given by $$ \exp\left(\sum_{n=1}^\infty \frac{u_nz^n}{n!}\right) \\ = 1 + \frac{u_1}{1!}z + \frac{u_1^2 + u_2}{2!}z^2 + \frac{u_1^3 + 3u_1u_2+u_3}{3!}z^3 + \frac{u_1^4 + 6u_1^2+4u_1u_3 + 3u_2^2+u_4}{4!}z^4 + \ldots, $$ if you're interested. Here $u_1, u_2, \ldots$ are variables marking how many times we use each letter, and $z$ marks the total number of letters (we don't really need $z$ here). So for the number of non-equivalent permutations of "AABBC", you look up the coefficient of $u_2u_2u_1z^5$, meaning two A's, two B's, one C, five total. Then multiply it by $5!$ to get $15$ as the answer.

In the language of the symbolic method, we constructed the class $$ \mathrm{SET}\left(\sum_{n\ge 1} \mathcal U_n \mathrm{SET}_n(\mathcal Z)\right). $$ $\mathrm{SET}_n(\mathcal Z)$ is a set (or urn) of $n$ single elements, i.e. $n$ of the same letter. $\mathcal U_n$ is there to remember the number $n$. The outer $\mathrm{SET}$ assigns where to place each letter in the string in such a way that we don't care about swapping letters when possible.

Related Question