Time complexity analysis of Kruskal’s Algorithm

algorithmsanalysis-of-algorithmscomputational complexitydiscrete optimizationtrees

Hello I have a doubt about the time complexity of Kruskal's Algorithm.

Symbols:

$E \implies$ Total number of edges in the graph

$V \implies$ Total number of vertices in the graph

$O \implies$ Big $O$ notation for calculating the upper bound of the function(as tight as possible)

Tools used:

Binary Heap => Create: $O(n)$. Pop: $\log(n)$. Peek: $O(1)$

Union Find(DSU) => Using chain compression and rank compression. Amortised time complexity comes out to be Ackerman function which I assumed to be $O(1)$ because I am working with a dense enough graph

Algorithm:

  1. Create a binary heap and dump all the edges with weight as the sorting factor into this heap. Time complexity $= O(E)$
  2. Create a loop($V-1$ times as those are the number of edges we need) that goes over and do:

2.1. Peek for the minimum weight edge $O(\log(E))$ and find parent of set in union set which takes $O(1)$ time.

2.2 Union of different sets $O(1)$ time again.

2.3 Take this edge $O(1)$ time again.

2.4 Goto 2.1 if $V-1$ iterations are not over

So according to this, the time complexity feels to be $O((V-1)(\log(E)))$ which is $O(V\log E)$.

The given time complexity $O(E\log V)$ or $O(E\log E)$ is also correct as Big $O$ notation is just the upper bound but I want a tight upper bound.

Can you explain on my analysis.

Here is a dense graph using Kruskal's Algorithm as mentioned above for reference in C++ (using Leetcode programming problem as an example)

Best Answer

You are correct in getting an $O(\log E)$ running time for a single iteration of the algorithm, but you are wrong about needing only $V-1$ iterations.

Remember, Kruskal's algorithm does not add every single edge it encounters to the future spanning tree. An edge is only added if it connects two different components of the forest built so far.

This means that we cannot be sure that we're done after $V-1$ steps; that's just the best case, if we're lucky and the first $V-1$ edges were all usable. In the worst case, all $E$ steps will be necessary: maybe the most expensive edge, the one we'd rather not use until we exhaust all other possibilities, is the only edge connecting two sets of vertices.

This is why the running time is $O(E \log E)$.

(Also, if your analysis were correct, then you still don't get $O(V \log E)$. Don't forget about step 1, which is $O(E)$. The only safe way to combine the two parts of your analysis would be $O(E + V \log E)$; depending on how many edges there are in the graph, either term could be higher than the other. On the other hand, once we put an $O(E \log E)$ bound on the loop's running time, we can ignore the $O(E)$ term.)

Related Question