Given a binary tree with N labelled leaves, is it possible to find its unique number in the Catalan range

catalan-numberscombinatoricsgraph theoryplanar-graphstrees

The question is about finding the inverse to the problem of generating the $n^{th}$ binary tree with N labelled leaves (Generating the $n^{th}$ full binary tree over $N$ labelled leaves).

Let's say if $N = 4$, the possible set of trees are

1: (((1, 2), 3), 4)
2: (1, ((2, 3), 4))
3: ((1, (2, 3)), 4)
4: (1, (2, (3, 4)))
5: ((1, 2), (3, 4))

If I choose a specific tree from this set, let's say $(1, ((2, 3), 4))$, is there an algorithm that gives me back the value 2? The Catalan range for the problem is 1 to 5 and the unique number corresponding to the given tree is 2.

What do I mean by Catalan Range?

If there are N leaf nodes, the maximum possible binary trees is $C(N-1)$. For a given $n$, its $C(n)$ is the $n^{th}$ Catalan number. We can uniquely identify all the individual binary trees if we assign them a number from $1$ to $C(N-1)$ in order. I'm referring to this range of numbers from $1$ to $C(N-1)$ as the Catalan Range.

What scheme am I using to order the trees from $1$ to $C(N-1)$?

I don't really mind the scheme used to order the trees as long as all the trees can be uniquely identified within that scheme. For example,

$1$: The tree having just one node in the left sub tree and $N-1$ nodes in the right sub tree.
$2$: The tree still having just one node in the left sub tree and with a slightly different right sub tree now.
.
.
$C(N-1)$: The tree having $N-1$ nodes in the left sub tree and one node in the right sub tree.

For the sake of making it easier to discuss the answer, we could follow the scheme as decided by the first answer in this page.

Best Answer

Let $f$ be the function mapping full binary trees to integers; I'll use the convention that binary trees with $n$ leaves will map to the range $\{0, 1, \dots, C_{n-1}-1\}$ because that's easier to work with in the recursion. You can add $1$ later.

If we have a binary tree $T$, let $L$ be the "left" subtree: the subtree with leaves $1, 2, \dots, k$ for some $k$. Let $R$ be the "right" subtree: the subtree with leaves $k+1, k+2, \dots, n$. We will find $f(T)$ in terms of $f(L)$, $f(R)$, and $k$ where for the purposes of finding $f(R)$ we relabel $R$ to have leave $1, 2, \dots, n-k$.

Our trees are labeled in increasing order of $k$. So before this tree, we have $$ C_0 C_{n-2} + C_1 C_{n-3} + \dots + C_{k-2} C_{n-k} $$ trees whose left subtree has $1, 2, \dots, k-1$ leaves respectively.

Next, before this particular left subtree $L$, there are $f(L)$ previous left subtrees on $k$ leaves; for each one of them, there are $C_{n-k-1}$ right subtrees. All $f(L) C_{n-k-1}$ of the combined $n$-leaf trees go before $T$.

Finally, there are $f(R)$ trees with the same left subtree, but with a right subtree preceding $R$; these also go before $T$.

Altogether, we get the recursion $$ f(T) = \sum_{i=1}^{k-1} C_{i-1} C_{n-i-1} + f(L) C_{n-k-1} + f(R). $$ The base of the recursion sets $f(T) = 0$ when $T$ has just one or two leaves, in which case there's only one possible tree. (Actually, we only need the one-leaf case as our base case.)