Fast method to convert Catalan path to lexicographic position.

binomial-coefficientscatalan-numberscombinatorics

I was looking through the questions: Fast way to get a position of combination (without repetitions) and (Fast way to) Get a combination given its position in (reverse-)lexicographic order for converting a combination to its lexicographic position and then going from the position to the combination. A logical extension of this is the generalized Catalan numbers which are basically combinations with an additional constraint.

Consider a gambler tossing a coin and winning 1\$ on heads and losing 1\$ on tails. We know he got $l$ tails and $l+k$ heads. The number of ways he could have done this is: ${k+2l \choose l}$ and if we wanted to generate the actual paths, we can do so using the approaches in the questions linked above (the binary representations of the combinations would be the tosses at which he got tails).

But now, we add the further constraint that he reaches $k\$$ for the first time after $k+2l$ tosses. The post: Probability that random walk will reach state $k$ for the first time on step $n$ shows that the number of ways to do this becomes: ${k+2l-1 \choose l} – {k+2l-1 \choose l-1}$. Now, I want to loop through these paths. So, just as the first two posts above show how to go from combinations to lexicographic position and back, is there an efficient way to do this for these paths as well (that doesn't involve storing all paths in memory). For this question, let's consider we're given a path in binary. Something like: [0,1,0,0,0,1,1] (1's are tosses where he got tails) and want to get an integer which is the lexicographic position if you list all the valid paths (the total number of which is the generalized Catalan number).

Best Answer

Let $N(k,l)=\binom{k+2l-1}{l}-\binom{k+2l-1}{l-1}$ denote the number of paths. Note that $$ N(k,l) = N(k-1,l) + N(k+1,l-1) \qquad (k>0) $$ with the base cases $N(0,0)=1$, $N(0,l)=0$ for $l>0$. This comes from conditioning on whether the first step is a heads or tails. Furthermore, with the encoding Heads = 1, Tails = 0, all of the paths counted by $N(k+1,l-1)$ which begin with tails will occur lexicographically before those counted by $N(k-1,l)$. Therefore, to find the lexicographic position of a path, you can do as follows:

  • Maintain a variable accum, initially set to $0$.

  • Loop through the entries of the path from left to right.

    • Whenever the current entry of the path is heads, add the value $N(k' + 1, l' - 1)$ to accum, where $l'$ is the number of tails including and after the current entry, and $k'+l'$ is the number of heads including and after the current entry.
  • After the loop, accum will equal the desired lexicographic index.

Conversely, to convert from a lexicographic index to a path, you can use the following algorithm:

  • Maintain a variable accum, initially set to the given lexicographic index.

  • Maintain two variable $k'$ and $l'$, initially set to the given $k$ and $l$. As the path is specified, these will represent the values of $k$ and $l$ for the unassigned part of the list.

  • Maintain a list of length $k+2l$, whose entries are initially unassigned (this will be the output path).

  • For each entry of the path, do the following:

    • Check to see if $N(k' + 1, l' -1)$ is less than or equal to accum. If it is, then the current entry of the path should be set to heads. Furthermore, you should subtract $N(k' + 1,l' -1)$ from accum. Finally, subtract one from $k'$.

    • Otherwise, the current entry of the path should be set to tails, and accum is unchanged. Then, add one to $k'$, and subtract one from $l'$.


If your only purpose is to efficiently loop through all these paths in lexicographic order, then converting back and forth between paths and lexicographic indices is not necessary. All you need is a method that, given a path, finds the lexicographically next path. By calling this method repeatedly, you will loop through all paths.

Here is the idea of such a method. Given a path as a sequence of H and T, the next path is given by…

  • Finding an instance of TH (meaning tails immediately followed by heads), and replacing it with HT.

    • If there are multiple instances of TH, choose the rightmost one for which the switch TH -> HT still leaves a legal path. The switch is legal if and only if the accumulated wealth up to but not including TH is at most $k-2$.
  • Rearranging all of the flips to the right of that instance of TH so all tails occur before heads.

For example, suppose that $k=4$ and $l=3$, and the current path is

T H H T H H H T H H

There are three instances of TH. The accumulated wealth before each of them, going left to right, is $0$, $1$, and $3$. The ones where it would leave a legal path if you switch are $0$ and $1$; the last $3$ is too much, since $k-2=2$. Therefore, we switch the latest one. All of the heads after that are moved as far to the right as possible. Here is the next path, where the heads and tails which were switched are marked with ^.

T H H H T T H H H H
      ^ ^