Number of pairs of distinct strings with length $L$ and $K$ distinct characters

combinatorics

Two strings $S$ and $T$ are similar if we can rename the same letters in $S$ and thus get $T$. For example, $abacaba$ and $xyxzxyx$ are similar, whereas $abacaba$ and $pqpqpqp$ are not.

Given $L$ and $K$, count the number of pairs of distinct strings each of which has length $L$ and consists of $K$ distinct letters such that two strings in each pair are similar. We can assume that the alphabet consists of a-zA-z i.e 52 possible characters.

Actually, this is a problem from a programming contest (it's over now so we can safely discuss the problems). Nevertheless, I believe this is just combinatorics. I am a bit stuck. There is an example of what the algorithm should output: $(5,2) \to 15$. I can't quite understand how that could be such a small number since there are at least $\binom{52}{k}$ ways to choose those $k$ distinct numbers.

Is there anything I am missing in the problem statement?

Best Answer

I think there's a misunderstanding in the problem statement. They're only considering an alphabet of size $K$, not 52. The factor arising from choosing the substitution from 52 allowed characters will, as you noted, be enormous. But they're not considering it.

I'll start by answering the question as stated:


Let $N(L, K)$ be the number of distinct strings of length $L$ made of given $K$ characters (or $K$ generic placeholder characters), with each character used at least once, for $L\geq K$.

$$N(L, K) = K^L - {K\choose 1}(K-1)^L + {K\choose 2}(K-2)^L\dots = \sum\limits_{i=0}^{K-1} (-1)^i {K \choose i}(K-i)^L$$ This uses the inclusion-exclusion principle. Check this answer: https://math.stackexchange.com/a/664790/52735


From $N(L, K)$, we can remove the similar strings by dividing by $K!$ (number of permutations of $K$ letters).

Each remaining string of length $L$ with $K$ generic characters can be turned into a real string by choosing an appropriate substitution from the 52 possible letters. Two such assignments will give similar strings.

Putting these together:

$$\mathrm{ExpectedAns}(L, K) = \frac{N(L, K)}{K!} \times \frac{^{52}\mathrm{P}_{K} \times (^{52}\mathrm{P}_K-1)}{2}$$ with $N(L, K)$ given above (for $L\geq K$). This clearly doesn't match $\mathrm{Ans}(5, 2) = 15$.


My guess is they're only allowing for $K$ given characters to be substituted. So maybe this is what they're looking for

$$\mathrm{Ans}(N, K) = \frac{N(L, K)}{K!} \times \frac{^{K}\mathrm{P}_{K} \times (^{K}\mathrm{P}_K-1)}{2}$$ $$\mathrm{Ans}(N, K) = \frac{N(L, K)}{K!} \times \frac{K! \times (K!-1)}{2}$$ For $(5, 2)$ this gives $15$. The factor $N(L, K)/K!$ is referred to as Stirling numbers of the second kind: https://en.wikipedia.org/wiki/Stirling_numbers_of_the_second_kind