[Math] Short proof of correctness of Huffman algorithm

algorithmscoding-theory

I have to show the correctness (in terms of efficiency) of the Huffman-Algorithm in class. Given a source $$Q=\left(
\begin{array}{ccc}
a_1 & \ldots & a_N\\
p_1 & \ldots & p_N\end{array}\right)$$

and treating the pairs of symbols $a_j$ and their probabilities $p_j$ as nodes, the huffman-algorithm, visually, connects the two nodes with least probability and treats the resulting tree as a new node with its own probability (the sum of the node's ones). It terminates in a binary tree that I can associate a binary code with. I have to prove that this code is optimal in the sense of having the least average codeword-length.

Every proof I found shows it via first developing the theory of greedy algorithms.

Is there an easy to formulate direct proof?

Best Answer

Call an $n$-tuple $\mathbf a=(a_1,\ldots, a_n)\in\Bbb N_0^n$ of non-negative integers admissible if $\sum_{i=1}^n 2^{-a_i}\le1$.

For an $n$-tuple $\mathbf x=(x_1,\ldots,x_n)\in[0,\infty)^n$ of non-negative reals and an admissible $n$-tuple $\mathbf a$ call $w_{\mathbf x}(\mathbf a):=\sum_{i=1}^nx_ia_i$ the total code length of $\mathbf a$ (with respect to $\mathbf x$). Say $\mathbf a$ is optimal (with respect to $\mathbf x$) if it has minimal total code length with respect to $\mathbf x$.

Theorem. Let $n\ge 2$, $x_1,\ldots,x_{n}\in[0,\infty)$, and assume $x_i\ge \max\{x_1,x_2\}$ for all $i>2$. Let the $(n-1)$-tuple $(A,a_3,\ldots,a_{n})$ be optimal with respect to $(x_1+x_2,x_3,\ldots,x_{n})$. Then $(A+1,A+1,a_3,\ldots,a_{n})$ is optimal with respect to $(x_1,x_2,\ldots,x_{n})$.

Proof. Write $a_1:=a_2:=A+1$. First note that $\mathbf a=(a_1,a_2,a_3,\ldots,a_{n})$ is an admissible $n$-tuple because $2^{-(A+1)}+2^{-(A+1)}=2^{-A}$.

Let $(b_1,b_2,\ldots,b_n)$ be any admissible $n$-tuple. We want to show $w_{\mathbf x}(\mathbf b)\ge w_{\mathbf x}(\mathbf a)$. We may assume that $b_i\le b_j$ whenever $x_i>x_j$ (or else swapping $b_i\leftrightarrow b_j$ would produce a "better $\mathbf b$"). In particular, $b_i\le\min\{b_1,b_2\}$ for $i>2$.

Assume $b_1> b_2$. Then from $\sum_{i=1}^n2^{-b_i}\le1$, we obtain $$2^{b_2-b_1}\le 2^{b_2}-\sum_{i=2}^n2^{b_2-b_i}.$$ The number on the right is an integer, hence $\ge 1$. It follows that the inequality remains true if we replace $b_1$ with $b_2$. At the same time this change to $\mathbf b$ decreases (or keeps) the total code length. Similarly, if $b_2>b_1$, we can decrease $b_2$ without harm. In other words, we may assume that $b_1=b_2$. But then $(b_1-1,b_3,\ldots,b_n)$ is an admisible $(n-1)$-tuple and $$\begin{align}w_{(x_1,x_2,x_3,\ldots,x_n)}(b_1,b_1,b_3,\ldots,b_n) &=w_{(x_1+x_2,x_3,\ldots,x_n)}(b_1-1,b_3,\ldots,b_n) +x_1+x_2\\ &\ge w_{(x_1+x_2,x_3,\ldots,x_n)}(A,a_3,\ldots,a_n) +x_1+x_2\\ &=w_{(x_1,x_2,x_3,\ldots,x_n)}(A+1,A+1,b_3,\ldots,b_n) \end{align}$$ $\square$

Related Question