Problem
I have a problem that requires a Graph algorithm, which given
$G = (V, E)$ and $V' \subset V$
returns
$G' =(V', E')$ where $E' = \{ (start, end) \in (V' \times V') : directpath(G, V', start,end) \}$
where directpath(Graph,V', node1, node2)
means that there is a path between node1
and node2
in Graph
where every node in the path is in $V \setminus V' \cup {start, end}$
More informally:
Given a subset of interesting nodes, the new graph should have an edge between two interesting nodes iff there was a path between those nodes in the original graph that only traversed non interesting nodes.
A simple example would be the graph $G = (V, \{ (R, 1), (1,C), (C,2), (2, E) \})$
Given the subset $V'=\{ R, C, E \}$ the resulting graph would be $G'$ = $(V', { (R, C), (C,E)}$ )
Context
I already have the actual algorithm itself working, but I am fairly sure that I just reinvented something that I can't find the name of despite searching and asking around.
Also I am wondering what a reasonably efficient way to do this is, depending on the algorithmic complexity I might have to design the code that this is part of differently. My intuition would be that this should be at least doable in linear time and I could probably get my algorithm down to that, but maybe I missing something and am wrong in either direction.
The context of this is program analysis, specifically dataflow analysis.
My use case is only concerned with directed Graphs, but this is a problem that would probably also exist for undirected graphs.
Code
def filter_graph(
graph: DiGraph[T],
subset: Iterable[T],
self_loops=False,
) -> DiGraph[T]:
result: DiGraph[T] = DiGraph()
subset = set(subset)
for node in graph.nodes():
if node in subset:
t = filter_search(graph, node, subset)
for bound in t:
if node != bound or self_loops:
result.add_edge(node, bound)
return result
def filter_search(
graph: DiGraph[T], source: T, subset: Set[T], visited=None
) -> Set[T]:
visited = visited or set()
visited.add(source)
bound: Set[T] = set()
for succ in graph.successors(source):
if succ in subset:
bound.add(succ)
else:
if not succ in visited:
bound |= filter_search(graph, succ, subset, visited)
return bound
Best Answer
The best answer I have gotten to this is that "kind of a special case of Tarjan’s path expression algorithm"
Relevant Citation:
Robert Endre Tarjan. 1981. Fast Algorithms for Solving Path Problems. J. ACM 28, 3 (July 1981), 594–614. DOI:https://doi.org/10.1145/322261.322273
Credit to the person who told me about this: https://twitter.com/BanjoTragedy/status/1250796978994323456