Average path length of a binary tree with $2^n$ leaves

discrete mathematicsgraph theorytrees

I want to prove a binary tree with $2^n$ leaves has expected depth at least $n$. Expected depth is calculated as average path length in a tree from root to a leaf. It's obvious for a complete binary tree of depth $n$, but there are many other trees with $2^n$ leaves. Their depth is higher, but how to prove the average path length would be $\geq n$.

Best Answer

We need to prove that for a binary tree with $N$ leaves, the average depth is at least $\log_2 N$. We can prove this by induction. For $N=1$, that is for a single node tree, the depth is $0=\log_21$. For all $n<N$ we make the inductive hypothesis that average depth of a binary tree with $n$ leaves is $\ge\log_2n$.

Now consider a binary tree with $N$ leaves. Any binary tree can be written as the union of the root, it left subtree and its right subtree. Supoose the number of leaves of the left subtree be $n_l$ and that of the right subtree is $n_r$. Then $N=n_l+n_r$. We have $n_l,n_r<N$ (I am assuming $n_l,n_r>0$. Otherwise if $n_l=0$, then $n_r=N$ and we look at the right subtree only and carry on the same argument). Now the average depth (d) of the original tree will be one more than the sum of the average depth (d_l) of the left subtree weighted by its proprtion of leaves ($\frac{n_l}{N}$) and the average depth (d_r) of the right subtree weighted by its proprtion of leaves ($\frac{n_r}{N}$),i.e., $$d=\frac{n_l}{N}d_l+\frac{n_r}{N}d_r+1\ge\frac{n_l}{N}\log_2n_l+\frac{n_r}{N}\log_2n_r+1$$ Thus we need to show $$\frac{n_l}{N}\log_2n_l+\frac{n_r}{N}\log_2n_r+1\ge\log_2N$$ or, equivalently $$\frac{n_l}{N}\log_2\frac{n_l}{N}+\left(1-\frac{n_l}{N}\right)\log_2\left(1-\frac{n_l}{N}\right)+1\ge0$$
This follows from $a\log_2a+(1-a)\log_2(1-a)\ge-1$ for $a<1$ (Can you prove this?).

Related Question