[Math] Number of ways to select n letters from a word

combinatorics

I came across the following question:

"In how many ways can we select 4 letters from the word MISSISSIPPI?"

This question can be solved by considering the different possibilities as follows and adding the numbers:

Ways of selecting {(4 alike)+(3 alike and 1 different)+(2 alike and 2 different)+(4 different)}

But I directly went and did 11C4 and got a marginally BIG number. What have I done wrong? Why should I approach all word-letter question like this?

Edit: I'm truly very sorry that I didn't look at other questions regarding the same topic first but I am glad that I posted again because I have received answers other than the ones related to the generating function, which is a bit too advanced for students at my level; the stars and bars method, as suggested by Shagnik can be understood much better.

Please help! Thanks so much in advance 🙂 Regards.

Best Answer

We will determine the $4$ letters chosen by asking four questions:

  1. How many I's are there?
  2. How many M's are there?
  3. How many P's are there?
  4. How many S's are there?

We will represent the answers to these questions by $x_I, x_M, x_P$ and $x_S$ respectively*. What conditions do we have on these variables?

Since we have $4$ letters in total, we must have $x_I + x_M + x_P + x_S = 4$. Also, since each variable is counting something, it must be a non-negative integer. That is, $x_I, x_M, x_P, x_S \in \mathbb{Z}_{\ge 0}$.**


Stars and bars

Now there is a general method, often called the "stars and bars" argument, for counting the number of ways of writing a number as a sum of smaller numbers. In full generality, suppose we want to write some non-negative integer $n$ as a sum of $k$ non-negative integers; that is $$ n = y_1 + y_2 + ... + y_k, $$ where $y_1, y_2, ..., y_k \in \mathbb{Z}_{\ge 0}$.

We will draw every such sum using stars and bars; there will be $n$ stars, and $k-1$ bars splitting the separate summands. $y_1$ will be the number of stars before the first bar, $y_2$ will be the number of stars between the first and second bar, and so on, until we get $y_k$ to be the number of stars after the last ($k-1$st) bar. For example, the sum $8 = 3 + 0 + 4 + 1$ would look like $$ \underbrace{* * *}_{y_1 = 3} | \underbrace{}_{y_2 = 0} | \underbrace{* * * *}_{y_3 = 4} | \underbrace{*}_{y_4 = 1} \; .$$

One important thing to note is that we are counting ordered sums, so the order of the summands is important. For example, the sum $8 = 4 + 3 + 1 + 0$ has the different diagram $$ \underbrace{* * * *}_{y_1 = 4} | \underbrace{* * *}_{y_2 = 3} | \underbrace{*}_{y_3 = 1} | \underbrace{}_{y_4 = 0} \; ,$$ which will be counted as a distinct solution.

How does this help us count the number of sums? Well, note that every such diagram is simply a permutation of $n$ stars and $k-1$ bars. Moreover, every such permutation corresponds to a different sum. Hence we only need to count these permutations of symbols.

Now there are a total of $n+k-1$ symbols in a line. If we choose $k-1$ of thoes symbols to be bars, that determines the diagram, since the remaining symbols must be stars. Hence the number of sums is $$\binom{n+k-1}{k-1}.$$ Note that this is equal to $\binom{n+k-1}{n}$, since we could instead choose which $n$ symbols are stars.


"Going down to Mississippi"

Returning to the problem at hand, this means the number of solutions to $x_I + x_M + x_P + x_S = 4$ is $\binom{7}{3} = 35$.

However, while every set of $4$ letters corresponds to one such sum, not every sum comes from a valid choice of letters. Our final restriction comes from the law of conservation of mass, adapted for mathematical wordplay. We cannot select more copies of a letter than appear in the word "MISSISSIPPI".

This means that, along with our non-negativity assumption, we must also have $x_I \le 4, x_M \le 1, x_P \le 2$ and $x_S \le 4$.

Now note that since we are only selecting $4$ letters in total, we can never have $x_I \ge 5$ or $x_S \ge 5$, hence we can ignore those requirements. This leaves only the conditions $x_M \le 1$ and $x_P \le 2$.

Another very useful observation is that we cannot violate both of these conditions at the same time. Indeed, if we had both $x_M \ge 2$ and $x_P \ge 3$, then $x_I + x_M + x_P + x_S \ge 5$. Hence from the $35$ solutions to the sum we counted earlier, we can subtract the solutions with $x_M \ge 2$ and the solutions with $x_P \ge 3$. Since no solution satisfies both those bounds, every bad solution is subtracted exactly once, as it should be.

So how many solutions have $x_M \ge 2$? Well, a nice way to count these is to introduce a new variable, setting $z_M = x_M - 2$. We then have $x_I + z_M + x_P + x_S = x_I + x_M + x_P + x_S - 2 = 4 - 2 = 2$, with $x_I, z_M, x_P, x_S \in \mathbb{Z}_{\ge 0}$. Since we now only require all our variables to be non-negative, rather than having one being at least $2$, we have reduced this to the stars and bars setting. Using the formula, there are $\binom{2 + 3}{3} = \binom{5}{3}= 10$ such solutions.

What about $x_P \ge 3$? This time we set $z_P = x_P - 3$. A similar argument shows there are $\binom{1+3}{3} = \binom{4}{3} = 4$ solutions in this case.

Hence the final answer, that is, the number of selections of $4$ letters from "MISSISSIPPI", is $$ \binom{7}{3} - \binom{5}{3} - \binom{4}{3} = 35 - 10 - 4 = 21. $$


Final remarks

In closing, let me remark that this is not that much shorter than the case analysis you suggested in your question (indeed, I would suspect that is how you were meant to solve the problem). However, it is always good to know different solutions, and with a longer word this may have proven to be a shortcut.

That being said, with a longer word, several further complications could have arisen. Here we got lucky in that there were only two kinds of bad solutions, and no solution was bad in both ways. In general, this correction would have involved the Inclusion-Exclusion Principle, which would provide an added level of difficulty.

The nice way to solve these problems in general is to use ordinary generating functions (or exponential generating functions if dealing with permutations), as suggested in @Beta's answer. These are a little more advanced, but very powerful and incredibly interesting, and you should hopefully encounter them in some combinatorics course at a later point.


Footnotes

*Why order the variables this way? In honour of Tyrion, of course!

**Some would call this set the naturals, denoted $\mathbb{N}$, but I prefer to have my naturals start from $1$.