Kruskal’s algorithm proof using induction on number of nodes

algorithmsdiscrete optimizationgraph theoryinductiontrees

I wanted to prove Kruskal's algorithm for simple connected undirected graph having pairwise distinct integral edge weights, produces MST using induction on number of nodes. The base case and induction hypothesis are clear.

Base Case: When there are just 2 nodes, since the graph is simple and connected there is just one edge and the prim's algorithm includes that edge and it is the MST.

Induction Hypothesis: Assume that Kruskal's algorithm correctly computes the MST for all connected undirected positive edge weighted simple graphs with $k$ nodes, where $k < n$.

Induction Step: Consider a connected undirected positive edge weighted simple graph $G$ with n nodes. We have to prove that Kruskal's algorithm for this produces a MST.

I'm confused about how to proceed with the proof. I can see that we should do something like removing a vertex so that number of vertices become $<n$ and we can use the induction hypothesis in some way. But on removing we might get many different connected components. Any help on this would be highly appreciated.

Best Answer

First, as all weights in graph $G$ are different, the MST is unique. (Proof is everywhere, and notably on Wikipedia's page, I feel it does not add much value to copy it here).

The minimum cost edge $e$ of $G$ is in $G$ MST: take a spanning tree $T$ which does not include $e$. Then add $e$. As it connects two vertices that are already linked in $T$, this creates a cycle. Therefore it is possible to delete another edge of this cycle: this creates a spanning tree which has a lower total cost than $T$. Hence the MST must include $e$.

Now, let's make a graph $G'$ by fusing the two vertices $v_1$ and $v_2$ linked to the minimum cost edge $e$, and modifying the edges in the following way: if a vertex $f$ is connected to both $v_1$ and $v_2$, keep only the edge with the minimum cost. If $G$ MST contains one of these edges, it necessarily contains the one we have kept.

Let's prove that $G$ MST is $G'$ MST plus $e$. We already know that $G$ MST includes $e$, and that all the other edges it includes are in $G'$ with the same weight. Hence when searching all trees in $G$ for the MST, we can reduce the search to the set $S$ of trees which are made of $e$ plus a graph in $G'$, and this graph has to be a tree. So $S$ is in bijection with the set $S'$ of trees in $G'$. The cost of a tree in $S$ is the cost of the corresponding tree in $S'$, plus $e$ cost; hence the minimum trees of $S$ and $S'$ are in correspondence. This proves that $G$ MST is $G'$ MST plus $e$.

$G'$ has $n-1$ vertices, so we can apply the induction hypothesis: $G'$ MST is built with Kruskal algorithm.

The last step is to prove that "classical Kruskal" algorithm on $G$ is equivalent (in terms of chosen edges) to "Kruskal with fusion" algorithm, i.e. selecting the minimum cost edge $e$ in $G$, fusing vertices $v_1$ and $v_2$, then using Kruskal with fusion on $G'$:

  • During classical Kruskal algorithm on $G$, we would never choose an edge that connects vertices that have been fused in Kruskal with fusion: this would create a cycle. (In fact Kruskal algorithm always chooses an edge between two disconnected trees, and in Kruskal with fusion each of those trees is represented by only one vertex).
  • Amongst the other edges, classical Kruskal will choose the minimum cost one. Kruskal with fusion will choose the minimum cost amongst those remaining, but as we have only suppressed edges that have a greater cost than another edge, both algorithms select the same edge.
Related Question