Generating the $n^{th}$ full binary tree over $N$ labeled leaves

combinatoricsgraph theoryplanar-graphstrees

I am looking for an algorithm to incrementally generate distinct full binary trees over $N$ unique leaves. That is, I want an algorithm that can generate the $n^{th}$ distinct tree without generating all the $n-1$ trees before. Pre-generating all trees is practically impossible above a certain N.

A full binary tree with its $N$ leaves labeled is equivalent to a binary grouping of $N$ unique elements grouped into pairs. There are $C_{N-1}$ different full binary trees or groupings of N leaves, where $C_n$ is the $n^{th}$ Catalan number. For $N = 4$, there are $C_3 = 5$ trees. These are, with internal nodes labeled $5..7$ and with the equivalent grouping:

enter image description here

For $N = 5$:

(1 (2 (3 (4 5))))
(1 (2 ((3 4) 5)))
(1 ((2 3) (4 5)))
(1 ((2 (3 4)) 5))
(1 (((2 3) 4) 5))
((1 2) (3 (4 5)))
((1 2) ((3 4) 5))
((1 (2 3)) (4 5))
((1 (2 (3 4))) 5)
((1 ((2 3) 4)) 5)
(((1 2) 3) (4 5))
(((1 2) (3 4)) 5)
(((1 (2 3)) 4) 5)
((((1 2) 3) 4) 5)

I see three ways to solve this problem (ultimately they are equivalent):

  1. There is a simple algorithm that can directly generate the next distinct (non-isomorphic) tree incrementally.
  2. There is a bijective encoding from tree $T_i$ to sequence $S_i$ such that generating $S_{i+1}$ (decoding into tree $T_{i+1}$) is easily doable.
  3. In the ideal case, there is a simple bijection of the $C_{N-1}$ trees to a continuous interval of $C_{N-1}$ natural numbers (preferentially $(1..C_{N-1})$) so that generating the $i^{th}$ tree is as easy as to decode it from the integer $i$.

There are many algorithms to encode a binary tree to a bijectively unique sequence (e.g. to a Prüfer sequence) but the issue is then how to generate the next sequence that can be decoded to the next tree without many failed sequences that do not encode a valid tree of the above description and do not encode a tree that was already visited.

Best Answer

The number of ways to fully parenthesize a string of $n$ letters, $C_{n-1}$, obeys the following recurrence: $$ C_{n-1} = \sum_{i=1}^{n-1}C_{i-1}C_{n-i-1} $$ To see this, consider the two "shallowest" groups in the parenthesization. Namely, ignoring the leftmost ( and the rightmost ), look at the parenthesis matching the leftmost (. This will surround the first $i$ letters in the string, which can be further paranthesized in $C_{i-1}$ ways, while the latter $n-i$ letters can be parenthesized in $C_{n-i-1}$ ways. For example when $n=5$, the $*$ illustrates all of the break points:

(1 * (2 (3 (4 5)))) C(0) * C(4) strings where the break point
(1 * (2 ((3 4) 5))) is after i=1
(1 * ((2 3) (4 5)))
(1 * ((2 (3 4)) 5))
(1 * (((2 3) 4) 5))

((1 2) * (3 (4 5))) C(1) * C(2) strings where the break point
((1 2) * ((3 4) 5)) is after i=2

((1 (2 3)) * (4 5)) C(2) * C(1) strings where the break point
(((1 2) 3) * (4 5)) is after i=3

((1 (2 (3 4))) * 5) C(4) * C(0) strings where the break point
((1 ((2 3) 4)) * 5) is after i=4
(((1 2) (3 4)) * 5)
(((1 (2 3)) 4) * 5)
((((1 2) 3) 4) * 5)

This recurrence gives you a quickly computable bijection from the first $C_{n-1}$ non-negative integers to binary trees. You are given an integer $k$ for which $0\le k\le C_{n-1}-1$. Compute the partial sums $$ \sum_{i=1}^{s-1} C_{i-1}C_{n-i-1} $$ to find the largest number $s\ge 1$ for which that partial sum is at most $k$. Then, insert parentheses into the list of numbers (1 2 3 ... n) as follows:

((1 2 ... s) (s+1 s+2 ... n))

If $s=1$, you can omit the parentheses around (1), and similarly when $s=n-1$ around (n).

Then, letting $$e=k - \Big(\sum_{i=1}^{s-1}C_{i-1}C_{n-i-1}\Big),$$ and letting \begin{align}k_1&=e\pmod {C_{s-1}}\\k_2&=\lfloor e/C_{s-1}\rfloor,\end{align} you will have $0\le k_1\le C_{s-1}-1$ and $0\le k_2\le C_{n-s-1}-1$, and you can recursively apply the bijections for $k_1$ to the list (1 2 ... s) and for $k_2$ to the list (s+1 s+2 ... n).

Edit: There was a "bug" in my bijection which I just fixed. You can test that it works here.

Edit2: I just fixed another off-by-one error.