[Math] How do the Catalan numbers turn up here

algorithmscatalan-numberscombinatorics

The Catalan numbers have a reputation for turning up everywhere, but the occurrence described below, in the analysis of an (incorrect) algorithm, is still mysterious to me, and I'm curious to find an explanation.


For situations where a quadratic-time sorting algorithm is fast enough, I usually use the following:

//Given array a[1], ... a[n]
for i = 1 to n:
    for j = i+1 to n:
        if a[i] > a[j]:
            swap(a[i],a[j])

It looks like bubble sort, but is closer to selection sort. It is easy to see why it works: in each iteration of the outer loop, a[i] is set to be the smallest element of a[iā€¦n].

In a programming contest many years ago, one of the problems essentially boiled down to sorting:

Given a list of distinct values $W_1, W_2, \dots, W_n$, find the indices when it is sorted in ascending order. In other words, find the permutation $(S_1, S_2, \dots, S_n)$ for which $W_{S_1} < W_{S_2} < \dots < W_{S_n}$.

This is simply a matter of operating on the indices rather than on the array directly, so the correct code would be:

//Given arrays S[1], ..., S[n] (initially S[i]=i āˆ€i) and W[1], ..., W[n]
for i = 1 to n:
    for j = i+1 to n:
        if W[S[i]] > W[S[j]]:
            swap(S[i],S[j])

But in the heat of the contest, I instead coded a program that did, incorrectly:

for i = 1 to n:
    for j = i+1 to n:
        if W[i] > W[j]:
            swap(S[i],S[j])

I realised the mistake after the contest ended, and later while awaiting the results, with desperate optimism I tried to figure out the odds that for some inputs, my program would accidentally give the right answer anyway. Specifically, I counted the number of permutations of an arbitrary list $W_1, \dots, W_n$ with distinct values (since only their order matters, not their actual values) for which the incorrect algorithm above gives the correct answer, for each n:

n       Number of "lucky" permutations
0       1
1       1
2       2
3       5
4       14
5       42
6       132
7       429
8       1430
9       4862
10      16796
11      58786
12      208012

These are the Catalan numbers! But why? I've tried to prove this occasionally in my free time, but never succeeded.


What I've tried:

  • The (pseudo)algorithm can be represented in more formal notation as the product of all inversions in a permutation. That is, we want to prove that the number of permutations $\sigma \in S_n$ such that $$\prod_{i=1}^{n}\prod_{\substack{j \in \left\{i+1,i+2,\ldots,n\right\}; \\ \sigma_i > \sigma_j}}(i,j) = \sigma^{-1}$$ (with the convention that multiplication is done left to right) is $C_n$. This change of notation does not make the problem any simpler.

  • I briefly skimmed through Stanley's famous list of Catalan problems, but this does not seem to be (directly) in the list. šŸ™‚

  • Some computer experimentation suggests that the lucky permutations are those that avoid the pattern 312, the number of which is apparently the Catalan numbers. But I have no idea how to prove this, and it may not be the best approach…

Best Answer

Your suspicions are correct. Let's show that a permutation is lucky iff it avoids the pattern 312.

For an injection $W$ from $\{1,\ldots,k\}$ to $\{n-k+1,\ldots,n\}$, let $N(W)$ denote the result of removing $W(1)$ and increasing all elements below $W(1)$ by $1$. For example, $N(32514) = N(3524)$.

Lemma 1. If $W$ avoids $312$ then so does $N(W)$.

Proof. Clear since the relative order of elements in $N(W)$ is the same as the corresponding elements in $W$.

Lemma 2. Suppose $W$ avoids $312$. After running one round of the algorithm, $S(1)$ contains the index of the minimal element in $W$, and $W \circ S = N(W)$.

Proof. The lemma is clear if $W(1)$ is the minimal element. Otherwise, since $W$ avoids $312$, all elements below $W(1)$ form a decreasing sequence $W(1) = W(i_1) > \cdots > W(i_k)$. The algorithm puts the minimal one $W(i_k)$ at $S(1)$, and puts $W(i_t)$ at $W(i_{t+1})$.

Theorem 1. If $W$ avoids $312$ then $W$ is lucky.

Proof. Apply Lemma 2 repeatedly. Lemma 1 ensures that the injection always avoids $312$.

For the other direction, we need to be slightly more careful.

Lemma 3. If $W$ contains a pattern $312$ in which $3$ doesn't correspond to $W(1)$ then $N(W)$ contains a pattern $312$.

Proof. The pattern survives in $N(W)$ since all relative positions are maintained.

Lemma 4. If $W$ doesn't contain a pattern $312$ in which $3$ corresponds to $W(1)$ and $1$ corresponds to the minimum of $W$ then after running one round of the algorithm, $S(1)$ contains the index of the minimal element, and $W \circ S = N(W)$.

Proof. Follows directly from the proof of Lemma 2.

Thus we should expect trouble if there are $i<j$ such that $W(1) > W(j) > W(i)$. However, if $W(i)$ is not the minimal element, the trouble won't be immediate.

List the elements which are smaller than $W(1)$ as $W(t_1),\ldots,W(t_k)$, and suppose that $W(t_r) < W(t_{r+1}) > \cdots > W(t_k)$. One round of the algorithm puts $t_r$ at the place of $t_{r+1}$. The following rounds maintain the relative order of the elements in positions $t_{r+1},\ldots,t_k$, and so in the final result, the position which should have contained $t_{r+1}$ will contain $t_r$.

Example: $W = 632541$. The final result is $S = 652134$, which corresponds to the permutation $143625$. We can see that $S(1)$ is correct since $W$ satisfies the conditions of Lemma 4. We have $t_r = 3$ and $W(t_r) = 2, W(t_{r+1}) = 5$. We see that indeed $W(S(5)) = 2$ instead of $5$.

Theorem 2. If $W$ contains $312$ then $W$ is unlucky.

Proof. Along the lines of the discussion above.

Related Question