How many different “syntactic trees” exist related to an $m$ word sentence

combinatoricsgraph theorytrees

Before talking about the main topic, I would like to say that
I'm not in major of mathematics, nor any of mechanics.

So, there could be an issue of defining concepts or words not being strict or even be ambiguous about what I'm actually saying.

Please, take that into consideration while reading this.


Now, let's get started.

Let's say, there's a sentence with $3$ words, $a$, $b$, and $c$.

We can analyze this sentence by structuring "syntactic trees" in $3$ ways.

Then, I deleted all the middle nodes between a root and $a,b$, and $c$.
Here's a picture of this process.

"3-word sentence and its syntactic trees"

Of course, there can be a tree which looks like this.

"Tree that has a middle node which has a single child node"

But I would like to say that the 3rd tree in the first picture and this tree in the 2nd picture are same,
because I can say that both node $a'$ and node $a''$ are referring to exactly same thing which is the node $a$.

Every time I do the analysis, I will delete all the nodes from the tree except the root and last nodes without any child.

When I apply this rule to analyze $4$ 'word' sentence,
I will get these trees.

"4-word sentence and its syntactic trees"

Here's the question.

How many different "syntactic trees" are there of an $m$ word sentence?

Thinking of this, I first thought of the number of partitions of $m$: the number of ways to write $m$ as a sum of positive integers. (For $4$, there's $5$ ways to do this: $1+1+1+1$, $2+1+1$, $2+2$, $3+1$, $4$)

In 'syntactic trees' however, we do care about order.

"Order does matter"

Investigating the topic with my friend, I finally able to show all the possible ways to write $m$ as a sum of positive integers, where order matters.

First, get all the possible ways about small number; for example $3$.

Then, for all the possibilities, do $2$ kinds of operations in each possibility.

One is about summing $1$ to the leftmost term of a equation, for example, $1+1+1 \rightarrow (1+1)+1+1 \rightarrow 2+1+1$.

Second is about making a new $1$ next to the leftmost term of a equation, for example, $1+1+1 \rightarrow (1)+1+1+1 \rightarrow 1+1+1+1$.

By doing these operations to every equation, I could get all the possibilities of making $4$ without any duplications.

Obviously, all the possibilities related to $m$ can be represented as $2^{m-1}$.

But problem was, I needed to take care of all the possibilities within every term of each equation.

"Every terms do matter"

For bigger numbers, this will definitely become insane.

I couldn't think of good way to make a formula for counting trees related to $m$ words. Even figuring out how to do the operation faster or coming up with more clever methods was too much for me.

I had no choice but to use a brute force method. Using Python, I was able to count number of trees of an $m$ word sentence, and until $m=20$ it was not a big deal. However, the time it took to calculate further terms seemed to increase exponentially, making this approach infeasible for higher values.

One thing I noticed was that between every step, the ratio between the number of $m$ word trees, and the number of $m+1$ word trees seemed to converge to some constant. I don't understand why this happens.

Here I've plotted the successive ratio of each step in graph.

"Points of Increasing rate and steps in pair on plane"


I want to ask if there's any explicit formula that counts the number of trees for each $m$ word sentence.

I would also like to know why the ratio of successive terms seems to converge to some constant.

I would be very grateful if anyone gives an explanation and helps me figure out the problem.

Thank you.


Furthermore:

Here's the list of the numbers I got.
Given that $m$ is amount of words in a sentence.

Define the function $C(m)$, which returns the number of "syntactic trees" related to an $m$ word sentence.

$$\begin{split}
C( 1 ) &= 1\\
C( 2 ) &= 1\\
C( 3 ) &= 3\\
C( 4 ) &= 11\\
C( 5 ) &= 45\\
C( 6 ) &= 197\\
C( 7 ) &= 903\\
C( 8 ) &= 4279\\
C( 9 ) &= 20793\\
C( 10 ) &= 103049\\
C( 11 ) &= 518859\\
C( 12 ) &= 2646723\\
C( 13 ) &= 13648869\\
C( 14 ) &= 71039373\\
C( 15 ) &= 372693519\\
C( 16 ) &= 1968801519\\
C( 17 ) &= 10463578353\\
C( 18 ) &= 55909013009\\
C( 19 ) &= 300159426963\\
C( 20 ) &= 1618362158587\\
C( 21 ) &= 8759309660445\\
C( 22 ) &= 47574827600981\\
C( 23 ) &= 259215937709463\\
C( 24 ) &= 1416461675464871\\
C( 25 ) &= 7760733824437545\\
C( 26 ) &= 42624971294485657\\
C( 27 ) &= 234643073935918683\\
\end{split}$$

Best Answer

Firstly, your "syntactic trees" of order $n\geq 2$ are essentially Schröder trees; ordered trees with $n$ leaves and with no vertices of outdegree $1$ (for $n=1$ we adopt the convention that the trivial tree with no edges and one leaf is the unique Schröder tree with one leaf, denoted $\epsilon$). We shall derive a closed form and simple asymptotic for the number of $n$-leaf Schröder trees using the theory of generating functions.


Let $\mathcal{T}$ be the set of Schröder trees (with any number of leaves). Let $t_n$ be the number of Schröder trees with $n$ leaves, and define the generating function

$$T(x)=\sum_{n\geq 0} t_n x^n$$

The key insight here is to notice that every Schröder tree is either $\epsilon$, or has a root with at least two (ordered) branches, each attached to a Schröder tree. More formally, there exists a natural isomorphism

$$\mathcal{T}\cong \{\epsilon\}\cup\bigcup_{n\geq 2}\mathcal{T}^n$$

between the set of Schröder trees and the set containing a $\epsilon$ and all $k$-tuples of Schröder trees, where $k\geq 2$. This implies the generating function identity

$$T(x)=x+\sum_{n\geq 2}T(x)^n=x+\frac{T(x)^2}{1-T(x)}$$

which we may rearrange into the quadratic equality in $T(x)$,

$$2T(x)^2-(1+x)T(x)+x=0$$

Using the quadratic formula and noting that $T(0)=t_0=0$, we may extract

$$T(x)=\frac{1+x-\sqrt{1-6x+x^2}}{4}$$

Finally, using the Taylor series of $\sqrt{1+y}$, we may write

\begin{equation} \begin{split} T(x)&=\frac14+\frac14x+\frac14\sum_{m\geq 0}\binom{2m}{m}\frac{(6x-x^2)^m}{(2m-1)4^m}\\ &=\frac14+\frac14x+\sum_{m\geq 0}\binom{2m}{m}\frac{1}{(2m-1)4^{m+1}}\sum_{k=0}^m\binom{m}{k}(-1)^{m-k}6^kx^{2m-k}\\ \end{split} \end{equation}

Reparametrizing in terms of $k=2m-n$, and extracting coefficients, we get that for $n\geq 2$, there are

$$t_n=[x^n]T(x)=\sum_{m=\lceil n/2\rceil}^{n}\binom{2m}{m}\binom{m}{2m-n}\frac{(-1)^{n+m}6^{2m-n}}{(2m-1)4^{m+1}}$$

distinct Schröder trees with $n$ leaves.


To derive the asymptotics of $t_n$, we note that $T(x)$ has radius of convergence $r=3-2\sqrt{2}$, and may be rewritten as

$$T(x)=A(x)+B(x)\sqrt{1-\frac{x}{r}}$$

where $A(x)=\frac{1+x}4$ and $B(x)=-\frac14\sqrt{1-rx}$. Therefore, using this formula , we may obtain the asymptotic

$$t_n\sim\frac{B(r)r^{-n}}{\Gamma(-1/2)n^{3/2}}=\frac{\sqrt{3\sqrt2-4}}{4\sqrt{\pi}}\cdot\frac{(3+2\sqrt2)^n}{n^{3/2}}$$

This explains your observation; the ratio of successive terms does indeed converge to the constant $3+2\sqrt2\approx 5.8284$.