The name of algorithm that generates a subgraph given a subset of nodes, where direct paths between nodes are retained

graph theoryprogramming

Problem

I have a problem that requires a Graph algorithm, which given

$G = (V, E)$ and $V' \subset V$

returns

$G' =(V', E')$ where $E' = \{ (start, end) \in (V' \times V') : directpath(G, V', start,end) \}$

where directpath(Graph,V', node1, node2) means that there is a path between node1 and node2 in Graph where every node in the path is in $V \setminus V' \cup {start, end}$

More informally:
Given a subset of interesting nodes, the new graph should have an edge between two interesting nodes iff there was a path between those nodes in the original graph that only traversed non interesting nodes.

A simple example would be the graph $G = (V, \{ (R, 1), (1,C), (C,2), (2, E) \})$

Given the subset $V'=\{ R, C, E \}$ the resulting graph would be $G'$ = $(V', { (R, C), (C,E)}$ )

Context

I already have the actual algorithm itself working, but I am fairly sure that I just reinvented something that I can't find the name of despite searching and asking around.

Also I am wondering what a reasonably efficient way to do this is, depending on the algorithmic complexity I might have to design the code that this is part of differently. My intuition would be that this should be at least doable in linear time and I could probably get my algorithm down to that, but maybe I missing something and am wrong in either direction.

The context of this is program analysis, specifically dataflow analysis.
My use case is only concerned with directed Graphs, but this is a problem that would probably also exist for undirected graphs.

Code

def filter_graph(
    graph: DiGraph[T],
    subset: Iterable[T],
    self_loops=False,
) -> DiGraph[T]:
    result: DiGraph[T] = DiGraph()


    subset = set(subset)

    for node in graph.nodes():
        if node in subset:
            t = filter_search(graph, node, subset)
            for bound in t:
                if node != bound or self_loops:
                    result.add_edge(node, bound)

    return result


def filter_search(
    graph: DiGraph[T], source: T, subset: Set[T], visited=None
) -> Set[T]:
    visited = visited or set()
    visited.add(source)

    bound: Set[T] = set()
    for succ in graph.successors(source):
        if succ in subset:
            bound.add(succ)
        else:
            if not succ in visited:
                bound |= filter_search(graph, succ, subset, visited)
    return bound

Best Answer

The best answer I have gotten to this is that "kind of a special case of Tarjan’s path expression algorithm"

Relevant Citation:

Robert Endre Tarjan. 1981. Fast Algorithms for Solving Path Problems. J. ACM 28, 3 (July 1981), 594–614. DOI:https://doi.org/10.1145/322261.322273

Credit to the person who told me about this: https://twitter.com/BanjoTragedy/status/1250796978994323456

Related Question