[Math] How many distinct n-letter “words” can be formed from a set of k letters where some of the letters are repeated

combinatoricsprobability

How many distinct n-letter "words" can be formed from a set of k letters where some of the letters are repeated?

Examples:

__

How many 6-letter words can be formed from the letters: ABBCCC?

This is elementary. There are 6! arrangements counting repeats. Then we just divide by 2!3! to account for the repeats caused by the 2!3! identical arrangements of the Bs and Cs

How many 5-letter words can be formed from the letters: ABBCCC?

Excluding A we have $\frac{5!}{2!3!}$

Excluding either B $\frac{5!}{3!}$

Excluding any of the Cs $\frac{5!}{2!2!}$

Then take the sum.

How many 4-letter words can be formed from the letters: ABBCCC?

At this point I find it difficult to procede without far too many cases.

Is there a general approach?

Best Answer

There is a fairly systematic way to work this type of question. You were specifically wondering about:

How many 4-letter words can be formed from the letters: ABBCCC?

First, you write the 'source partition' for your word:

[C,B,A]
[3,2,1]  <-- this is the `source partition`

Note that the source partition provides for 1 triple, 1 double, and 1 single. Corresponding to that, you write every possible 'target partition' of 4-letters.

[3,1]    requests one triple and 1 single
[2,2]    requests 2 doubles
[2,1,1]  requests one double and 2 singles

For each 'target partition', you write an expression that gives the number of ways that the given target type can occur. An example of the target type [3,1] is:

CCBC                                   # type [3,1]

The expression for that type is:

nCr(1,1)*nCr(2,1) * fac(4)/fac(3)

'nCr( n, r)' is the function for 'combinations', which gives the number of ways you can make a unique selection from n distinct things, taking r at a time. 'fac( n)' is the function for the factorial of n.

Note that source [3,2,1] provides 1 triple, and target [3,1] requests 1 triple. Hence nCr(1,1). After the triple is used up, the source [3,2,1] can provide 2 singles, and the target [3,1] requests 1 single. Hence nCr(2,1) The call to fac(4), in each expression, always corresponds to the 4-letter word. Any division, if it occurs in an expression, corresponds to each multiple request of the corresponding target partition. That's all there is to the method, but it isn't always easy to get every last detail correct. The entire computation, as I did it in the programming language Python, follows:

# The `source partition` for the word 'ABBCCC' is:
#                 [3,2,1]
#                                                # target
w = 0                                            # ------
w += nCr(1,1)*nCr(2,1) * fac(4)/fac(3)           # [3,1]
w += nCr(2,2)          * fac(4)/(fac(2)*fac(2))  # [2,2]
w += nCr(2,1)*nCr(2,2) * fac(4)/fac(2)           # [2,1,1]
#
# The answer is 38