Your solution looks good to me!
I have one suggestion if you consider directed graphs. Consider the graph:
A -> B -> C
D -> F
Let's say your algorithm starts arbitrarily at node B
. It will traverse the graph through B
and C
and mark them as connected components - but it won't catch A
!
A simple way to solve this problem is to allow the DFS to traverse the graph going both forwards and backwards, as though the graph was non-directed.
This is another way to solve it: Your algorithm produces sets of connected components. Let's say, as above, that the algorithm starts at B
and produces the set {B, C}
. Then, let's say that the algorithm arbitrarily selects A
as its next starting point. When it traverses the graph to B
, it should union the new set of connected components {A}
with the already-existing set {B, C}
instead of simply skipping node B
.
In general, if you encounter a node that's already part of a group of connected components during a DFS, mark the set of connected components that the node is in and then skip the node. Then, when you've finished traversing as many nodes as possible without retracing paths found by previous DFS's, union all of the marked sets and the set produced by the most recent DFS into one set of connected components and discard the unioned sets.
Best Answer
Use an integer to keep track of the "colors" that identify each component, as @Joffan mentioned. Start BFS at a vertex $v$. When it finishes, all vertices that are reachable from $v$ are colored (i.e., labeled with a number). Loop through all vertices which are still unlabeled and call BFS on those unlabeled vertices to find other components.
Below is some pseudo-code which initializes all vertices with an unexplored label (an integer 0). It keeps a counter, $componentID$, which vertices are labeled with as they are explored. When a connected component is finished being explored (meaning that the standard BFS has finished), the counter increments. BFS is only called on vertices which belong to a component that has not been explored yet.
The total running time is $O(|V| + |E|)$ since each edge and vertex is labeled exactly twice - once to initialize and again when it's visited.