Generating the $n^{th}$ full binary tree over $N$ labeled leaves

combinatoricsgraph theoryplanar-graphstrees

I am looking for an algorithm to incrementally generate distinct full binary trees over $N$ unique leaves. That is, I want an algorithm that can generate the $n^{th}$ distinct tree without generating all the $n-1$ trees before. Pre-generating all trees is practically impossible above a certain N.

A full binary tree with its $N$ leaves labeled is equivalent to a binary grouping of $N$ unique elements grouped into pairs. There are $C_{N-1}$ different full binary trees or groupings of N leaves, where $C_n$ is the $n^{th}$ Catalan number. For $N = 4$, there are $C_3 = 5$ trees. These are, with internal nodes labeled $5..7$ and with the equivalent grouping:

For $N = 5$:

(1 (2 (3 (4 5))))
(1 (2 ((3 4) 5)))
(1 ((2 3) (4 5)))
(1 ((2 (3 4)) 5))
(1 (((2 3) 4) 5))
((1 2) (3 (4 5)))
((1 2) ((3 4) 5))
((1 (2 3)) (4 5))
((1 (2 (3 4))) 5)
((1 ((2 3) 4)) 5)
(((1 2) 3) (4 5))
(((1 2) (3 4)) 5)
(((1 (2 3)) 4) 5)
((((1 2) 3) 4) 5)

I see three ways to solve this problem (ultimately they are equivalent):

There is a simple algorithm that can directly generate the next distinct (non-isomorphic) tree incrementally.
There is a bijective encoding from tree $T_i$ to sequence $S_i$ such that generating $S_{i+1}$ (decoding into tree $T_{i+1}$) is easily doable.
In the ideal case, there is a simple bijection of the $C_{N-1}$ trees to a continuous interval of $C_{N-1}$ natural numbers (preferentially $(1..C_{N-1})$) so that generating the $i^{th}$ tree is as easy as to decode it from the integer $i$.

There are many algorithms to encode a binary tree to a bijectively unique sequence (e.g. to a Prüfer sequence) but the issue is then how to generate the next sequence that can be decoded to the next tree without many failed sequences that do not encode a valid tree of the above description and do not encode a tree that was already visited.

Best Answer

The number of ways to fully parenthesize a string of $n$ letters, $C_{n-1}$, obeys the following recurrence: $$ C_{n-1} = \sum_{i=1}^{n-1}C_{i-1}C_{n-i-1} $$ To see this, consider the two "shallowest" groups in the parenthesization. Namely, ignoring the leftmost ( and the rightmost ), look at the parenthesis matching the leftmost (. This will surround the first $i$ letters in the string, which can be further paranthesized in $C_{i-1}$ ways, while the latter $n-i$ letters can be parenthesized in $C_{n-i-1}$ ways. For example when $n=5$, the $*$ illustrates all of the break points:

(1 * (2 (3 (4 5)))) C(0) * C(4) strings where the break point
(1 * (2 ((3 4) 5))) is after i=1
(1 * ((2 3) (4 5)))
(1 * ((2 (3 4)) 5))
(1 * (((2 3) 4) 5))

((1 2) * (3 (4 5))) C(1) * C(2) strings where the break point
((1 2) * ((3 4) 5)) is after i=2

((1 (2 3)) * (4 5)) C(2) * C(1) strings where the break point
(((1 2) 3) * (4 5)) is after i=3

((1 (2 (3 4))) * 5) C(4) * C(0) strings where the break point
((1 ((2 3) 4)) * 5) is after i=4
(((1 2) (3 4)) * 5)
(((1 (2 3)) 4) * 5)
((((1 2) 3) 4) * 5)

This recurrence gives you a quickly computable bijection from the first $C_{n-1}$ non-negative integers to binary trees. You are given an integer $k$ for which $0\le k\le C_{n-1}-1$. Compute the partial sums $$ \sum_{i=1}^{s-1} C_{i-1}C_{n-i-1} $$ to find the largest number $s\ge 1$ for which that partial sum is at most $k$. Then, insert parentheses into the list of numbers (1 2 3 ... n) as follows:

((1 2 ... s) (s+1 s+2 ... n))

If $s=1$, you can omit the parentheses around (1), and similarly when $s=n-1$ around (n).

Then, letting $$e=k - \Big(\sum_{i=1}^{s-1}C_{i-1}C_{n-i-1}\Big),$$ and letting \begin{align}k_1&=e\pmod {C_{s-1}}\\k_2&=\lfloor e/C_{s-1}\rfloor,\end{align} you will have $0\le k_1\le C_{s-1}-1$ and $0\le k_2\le C_{n-s-1}-1$, and you can recursively apply the bijections for $k_1$ to the list (1 2 ... s) and for $k_2$ to the list (s+1 s+2 ... n).

Edit: There was a "bug" in my bijection which I just fixed. You can test that it works here.

Edit2: I just fixed another off-by-one error.

Related Solutions

[Math] Recursive Generating function for enumerating leaf labeled binary trees

Since $B(z)$ is an exponential generating function $B(z) = \sum_n b_n \frac{z^n}{n!}$, and not $\sum_n b_n z^n$.

With correct definition the series expansion for $B(z)$ should read:

$$ B(z) \sim z+\frac{z^2}{2}+\frac{z^3}{2}+\frac{5 z^4}{8}+\frac{7 z^5}{8}+\frac{21 z^6}{16}+\frac{33 z^7}{16}+O\left(z^8\right) $$ and this satisfies the recurrence equation:

In[200]:= With[{b = Sum[(2 n - 3)!!/n! z^n, {n, 1, 8}] + O[z]^8}, 
 z + 1/2 b^2 - b]

Out[200]= SeriesData[z, 0, {}, 8, 8, 1]

[Math] Why is the Prüfer sequence in a labeled tree always unique

The uniqueness of the prufer sequence comes from the fact that the trees are reconstructable.

For example, with $\{4,4,4,5\}$ The first number not in the code is $1$, so we attach vertex $1$ to $4$(the first entry in the code) now drop the $4$ from the code and add a $1$ to the back to get $\{4,4,5,1\}$.

The first number not in the code is $2$, so attach $2$ to $4$, drop the $4$ and add $2$ to the back to get $\{4,5,1,2\}$.

The first number not in the code is $3$, so attach $3$ to $4$, drop the $4$ and add $3$ to the back to get $\{5,1,2,3\}$

The first number not in the code is $4$, so attach $4$ to $5$, drop the $5$ and add $4$ to the back to get $\{1,2,3,4\}$.

Now we have exhausted the entire original code. The final step is to add an edge between the two numbers not shown in our new code, so attach $5$ to $6$.

This will give you back the original tree. Since we have a way to "Uncode" the prufer code, it is unique.

Best Answer

Related Solutions

[Math] Recursive Generating function for enumerating leaf labeled binary trees

[Math] Why is the Prüfer sequence in a labeled tree always unique

Related Question