[Math] How to use BFS or DFS to determine the connectivity in a non-connected graph

algorithmscomputer sciencegraph theory

How can I design an algorithm using BFS(Breadth First Search) or DFS(Depth First Search) algorithms in order to determine the connected components of a non-connected graph? The algorithm must be able to denote the set of vertices of each connected component.

This is my approach:

Run DFS starting from each node:

Start by labeling all nodes as unvisited. Then, iterate over the nodes
in any order. For each node, if it's not already labeled as being in a
connected component, run DFS from that node and mark all reachable
nodes as being in the same CC. If the node was already marked, skip
it. This then discovers all CC's of the graph one CC at a time.

Moreover, this is very efficient. If there are m edges and n nodes,
the runtime is O(n) for the first step (marking all nodes as
unvisited) and O(m + n) for the second, since each node and edge are
visited at most twice. Thus the overall runtime is O(m + n).

Any idea of how to improve this solution?.

Best Answer

Your solution looks good to me!

I have one suggestion if you consider directed graphs. Consider the graph:

A -> B -> C
D -> F

Let's say your algorithm starts arbitrarily at node B. It will traverse the graph through B and C and mark them as connected components - but it won't catch A!

A simple way to solve this problem is to allow the DFS to traverse the graph going both forwards and backwards, as though the graph was non-directed.

This is another way to solve it: Your algorithm produces sets of connected components. Let's say, as above, that the algorithm starts at B and produces the set {B, C}. Then, let's say that the algorithm arbitrarily selects A as its next starting point. When it traverses the graph to B, it should union the new set of connected components {A} with the already-existing set {B, C} instead of simply skipping node B.

In general, if you encounter a node that's already part of a group of connected components during a DFS, mark the set of connected components that the node is in and then skip the node. Then, when you've finished traversing as many nodes as possible without retracing paths found by previous DFS's, union all of the marked sets and the set produced by the most recent DFS into one set of connected components and discard the unioned sets.