[Math] On “The Average Height of Planted Plane Trees” by Knuth, de Bruijn and Rice (1972)

co.combinatoricsdiscrete mathematicsenumerative-combinatoricslatticesrandom walks

I am trying to derive the classic paper in the title only by elementary means (no generating functions, no complex analysis, no Fourier analysis) although with much less precision. In short, I "only" want to prove that the average height $h_n$ of a tree with $n$ nodes (that is, the maximum number of nodes from the root to a leaf) satisfies $h_n \sim \sqrt{\pi n}$.

The outline is as follows. Let $A_{nh}$ be the number of trees with height less than or equal to $h$ (with the convention $A_{nh} = A_{nn}$ for all $h \geqslant n$) and $B_{nh}$ the number of trees of $n$ nodes with height greater than or equal to $h+1$ (that is, $B_{nh} = A_{nn} – A_{nh}$). Then $h_n = S_n/A_{nn}$, where $S_n$ is the finite sum
$$
S_n = \sum_{h \geqslant 1} h(A_{nh} – A_{n,h-1}) = \sum_{h \geqslant 1} h(B_{n,h-1} – B_{nh}) = \sum_{h \geqslant 0} B_{nh}.
$$
It is well known that $A_{nn} = \frac{1}{n}\binom{2n-2}{n-1}$, for the set of general trees with $n$ nodes is in bijection with the set of binary trees with $n-1$ nodes, counted by the Catalan numbers.

Therefore, the first step is to find $B_{nh}$ and then the main term in the asymptotic expansion of $S_n$.

At this point the authors use analytical combinatorics (three pages) to derive
$$
B_{n+1,h-1} = \sum_{k \geqslant 1} \left[\binom{2n}{n+1-kh} – 2\binom{2n}{n-kh} + \binom{2n}{n-1-kh}\right].
$$

My own attempt is as follows. I consider the bijection between trees with $n$ nodes
and monotonic paths on a square grid $(n-1) \times (n-1)$ from $(0,0)$ to $(n-1,n-1)$ which do not cross the diagonal (and are made of two kinds of steps: $\uparrow$ and $\rightarrow$). These paths are sometimes called Dyck paths or excursions. I can express now $B_{nh}$ in terms of lattice paths: it is the number of Dyck paths of length 2(n-1) and height greater than or equal to $h$. (Note: a tree of height $h$ is in bijection with a Dyck path of height $h-1$.)

Without loss of generality, I assume that they start with $\uparrow$ (hence stay above the diagonal). For each path, I consider the first step crossing the line $y = x + h – 1$, if any. From the point above, all the way back to the origin, I change $\uparrow$ into $\rightarrow$ and vice versa (this is a reflection wrt the line $y=x+h$). It becomes apparent that the paths I want to count ($B_{nh}$) are in bijection with the monotonic paths from $(-h,h)$ to $(n-1,n-1)$ which avoid the boundaries $y=x+2h+1$ and $y=x-1$. (See figure.)

In the classic book Lattice Path Counting and Applications by Mohanty (1979, page 6) the formula
$$
\sum_{k \in \mathbb{Z}} \left[\binom{m+n}{m-k(t+s)} – \binom{m+n}{n+k(t+s)+t}\right],
$$
counts the number of monotonic paths in a lattice from $(0,0)$ to $(m,n)$, which avoid the boundaries $y = x – t$ and $y = x + s$, with $t > 0$ and $s > 0$. (This result was first established by Russian statisticians in the 50s.) Therefore, by considering a new origin at $(-h,h)$, we satisfy the conditions of the formula: $s=1$, $t=2h+1$ and the destination point (the upper right corner) is now $(n+h-1,n-h-1)$. Then
$$
B_{nh} = \sum_{k \in \mathbb{Z}} \left[\binom{2n-2}{n+h-1-k(2h+2)} – \binom{2n-2}{n-h-1+k(2h+2) + 2h+1}\right].
$$
This can be simplified in
$$
B_{n+1,h-1} = \sum_{k \in \mathbb{Z}} \left[\binom{2n}{n+1-(2k+1)h} – \binom{2n}{n-(2k+1)h}\right],
$$
which, in turn, is equivalent to
$$
B_{n+1,h-1} = \sum_{k \geqslant 0} \left[\binom{2n}{n+1-(2k+1)h} – 2\binom{2n}{n-(2k+1)h} + \binom{2n}{n-1-(2k+1)h}\right].
$$
The difference with the expected formula is that I sum over the odd numbers ($2k+1$), instead of all positive integers ($k$). First, I hoped that the even terms would cancel out, but that does not seem to be the case.

Any idea where is the problem?

[Edit:
Starting from the expected result, by the same elementary binomial manipulations, we have
$$
B_{n,h} = A_{nn} + \sum_{k \in \mathbb{Z}}\left[\binom{2n-2}{n-k(h+1)} – \binom{2n-2}{n-1+k(h+1)}\right].
$$
But if we try to find a combinatorial interpretation to this sum in terms of bounded lattice paths, we fail: the destination point has coordinates $(n,n-2)$, which is below the inferior boundary $y = x – 1$, so the number of paths is $0$, but Mohanty's formula gives negative numbers in this kind of situation (although he does not mention this). Therefore, if we find a combinatorial interpretation for the absolute value of these numbers, we can understand the result in the same terms.]

[Edit: In response to a comment below, here are all the details in slow motion.
\begin{align}
B_{n,h} &= \sum_{k \in \mathbb{Z}}\left[\binom{2n-2}{n-(2k+1)(h+1)} – \binom{2n-2}{n-1+(2k+1)(h+1)}\right]\cr
B_{n+1,h-1} &= \sum_{k \in \mathbb{Z}}\left[\binom{2n}{n+1-(2k+1)h} – \binom{2n}{n+(2k+1)h}\right]\cr
&= \sum_{k \in \mathbb{Z}}\left[\binom{2n}{n+1-(2k+1)h} – \binom{2n}{n-(2k+1)h}\right]\cr
&= \sum_{k \geqslant 0}\left[\binom{2n}{n+1-(2k+1)h} – \binom{2n}{n-(2k+1)h}\right]\cr
&+ \sum_{k > 0}\left[\binom{2n}{n+1+(2k-1)h} – \binom{2n}{n+(2k-1)h)}\right]\cr
&= \sum_{k \geqslant 0}\left[\binom{2n}{n+1-(2k+1)h} – \binom{2n}{n-(2k+1)h}\right]\cr
&+ \sum_{k \geqslant 0}\left[\binom{2n}{n+1+(2k+1)h} -\binom{2n}{n+(2k+1)h}\right]\cr
&= \sum_{k \geqslant 0}\left[\binom{2n}{n+1-(2k+1)h} – \binom{2n}{n-(2k+1)h}\right]\cr
&+ \sum_{k \geqslant 0}\left[\binom{2n}{n-1-(2k+1)h} -\binom{2n}{n-(2k+1)h}\right]\cr
&= \sum_{k \geqslant 0}\left[\binom{2n}{n+1-(2k+1)h} – 2\binom{2n}{n-(2k+1)h} + \binom{2n}{n-1-(2k+1)h}\right].
\end{align}
]

Best Answer

I think the problem may be in the bijection. Consider instead the following bijection between (planted plane) trees with $n$ vertices and Dyck paths from $(0,0)$ to $(n-1,n-1)$: perform a depth-first search on the tree, moving $\uparrow$ in the Dyck path when you follow an edge away from the root, and $\rightarrow$ in the Dyck path when you follow an edge towards the root. Then a tree of height $h$ corresponds to a Dyck path which reaches $y=x+h-1$ but not $y=x+h$, so $A_{n,h}$ counts the Dyck paths from $(0,0)$ to $(n-1,n-1)$ which avoid the boundaries $y=x+h$ and $y=x-1$. By the referenced formula of Mohanty,

$$A_{n,h} = \sum_{k \in \mathbb{Z}} \left[\binom{2n-2}{n-1-k(h+1)} - \binom{2n-2}{n+k(h+1)}\right]$$

Then

$$\begin{eqnarray*} B_{n+1,h-1} &=& A_{n+1,n+1} - A_{n+1,h-1} \\ &=& \frac{1}{n+1} \binom{2n}{n} + \sum_{k \in \mathbb{Z}} \left[ \binom{2n}{n+1+kh} - \binom{2n}{n-kh} \right] \\ &=& \frac{1}{n+1} \binom{2n}{n} + \sum_{k \geqslant 0} \left[ \binom{2n}{n+1+kh} - \binom{2n}{n-kh} \right] + \\&& \sum_{k > 0} \left[ \binom{2n}{n+1-kh} - \binom{2n}{n+kh} \right] \\ &=& \frac{1}{n+1} \binom{2n}{n} + \binom{2n}{n+1} - \binom{2n}{n} + \\&& \sum_{k \geqslant 1} \left[ \binom{2n}{n+1+kh} - \binom{2n}{n-kh} - \binom{2n}{n+kh} + \binom{2n}{n+1-kh} \right] \\ &=& \sum_{k \geqslant 1} \left[ \binom{2n}{n+1+kh} - \binom{2n}{n-kh} - \binom{2n}{n+kh} + \binom{2n}{n+1-kh} \right] \\ &=& \sum_{k \geqslant 1} \left[ \binom{2n}{2n-(n+1+kh)} - \binom{2n}{n-kh} - \binom{2n}{2n-(n+kh)} + \binom{2n}{n+1-kh} \right] \\ &=& \sum_{k \geqslant 1} \left[ \binom{2n}{n-1-kh} - 2\binom{2n}{n-kh} + \binom{2n}{n+1-kh} \right] \\ \end{eqnarray*}$$

as desired.

Related Question