[Math] Binary Trees: relationship between height and number of children

trees

Lets take a Binary Tree with n nodes (and n-1 edges). If h is the height of the tree, then hϵ[log2(n), n] depending if the tree is balanced or not. The minimum height is reached if the tree is "completed", i.e. the nodes either have two children or are leaves. The maximum height is reached when every node bar one have one child only.

There is obviously a correlation between the number of children of the nodes and the height of the tree. I would like to know more about how informations regarding the number of children by node can refine the set of possible values for h. A few (easy) examples follows, but does it exist a general formula for putting boundaries for the possible values of h?

Such a formula would be extremly useful for calculating the computational complexity of operations upon Binary Trees. For example it is known that in a common Binary Search Tree if the node u has two children then both its predecessor and successor have only one child. This fact could be used to narrow the possible values for h and compare with the operations of others Binary Search Tree, like the AVL Trees or the Red-Black Trees.


Useful definition:

The deep of a node u, indicated by d(u), is the distance between u and the root. d(root) = 0 by definition.

The height of a node u, indicated by h(u), is the distance between u and its most distant descendant. h(root) = h by definition.

Let define c0 the number of nodes with zero children (i.e. leaves), c1 the number of nodes with one child, c2 the number of nodes with two children. Because of the definition of the Tree, we know that n = c0 + c1 + c2.


Base examples:

If c2 = 1 and c1 = n-2, then hϵ[n/2, n-2]: the minimum value is reached if the node with two children is the root, the maximum value is reached if the node with two children is the one with deep equals to 1.

If c2 = 3 and c1 = n-3, then hϵ[n/4, n-4]: the minimum value is reached if the nodes with two children are the root and its children, the maximum value is reached if the nodes with two children are those with the least deep.

If c2 = n/2 and c1 = 0 (or 1 if n is odd), then the tree is complete and h can only be log2(n).

If c2 = 0 and c1 = n-1, then the tree is completely unbalanced and h can only be n-1.

Best Answer

First a side remark. As you noted, $n=c_0+c_1+c_2$. Also, we note that every node except the root is the child of exactly one other node, therefore $n-1=c_1+2c_2$. Using these two relations, we can write $$\begin{align} c_1&=n+1-2c_0 \\c_2 &= c_0-1 \end{align}$$ And then only use $n$ and $c_0$ as parameters.

Second, the actual question. For the worst case (large $h$), you can simply put all one- and two-sibling in a single long chain, which means $h=c_1+c_2+1$. For the best case (small $h$) you make a balanced tree out of the two-children nodes. This will have height about $log_2(c_2)$ with about $c_2/2$ nodes on the lowest level. The remaining one-child nodes should be distributed into $c_2/2$ chains of equal lengths and put underneath the balanced tree. The result is something like $h\approx \log_2(c_2)+2\frac{c_1}{c_2}+1$.

So the answer you are looking for is approximately $h\in [\log_2(c_2)+2\frac{c_1}{c_2}+1,c_1+c_2+1]$. The second expression needs some appropriate rounding in case $c_2$ is not a nice number and such, but you can figure that out for yourself if neccesary.

Third, you see that the upper bound is not really good. In particular it is worse then any real actual red-black/avl tree. Maybe instead of considering strict bounds on $h$, you should think about the average $h$ over all trees with given $c_0,c_1,c_2$. That will be much closer to practical datastructures.