[Math] Finding all k-size subgraphs

algorithmscombinatoricsgraph theory

I have no experience with advanced combinatorics, but I have to solve a problem that I think I will need advanced combinatorial techniques, correct me if I am wrong.

Let $G$ be a large directioned graph with its vertices numerically labeled, I need to find all subgraphs of size $k$ (by size I mean number of vertices) in $G$. (By large I mean values up to 1 billion vertices).

My first though was to iterate over all vertices of $G$ making them root for $k$-sized subgraphs, after getting the root vertex $u$ I would build implicit trees where a child vertex is connect to its parent either with an incoming or outgoing edge. A basic restriction would be that a vertex can be listed in the tree only once, for that I could keep a visited-vertices list. In order to avoid repetition I could add a constraint that the label of a child can never be higher than i̶t̶s̶ ̶p̶a̶r̶e̶n̶t̶'s̶ ̶l̶a̶b̶e̶l̶ the root's label.

Up to this point I think my approach is fine, but now I have to generate all possible trees, because I can be selecting $k$ vertices in a $k$-depth tree, one at each level, or have a $l$-depth ($1<l<k$) tree and be selecting more than one vertice in one level.

For doing that I think maybe integer composition methods can help, by integer composition I mean all possible ways to create an integer number from the sum of at least 2 other integer numbers different from zero
where the order of adding matters. I have searched about that and I came across texts that said that using a revolving door ordering algorithm is the fastest way to generate all possible combinations.

The problem is that information I have found about revolving door ordering algorithms are somewhat heavy on combinatorics, a field I don't have deeply studied yet.

If it matters somehow, I am a Computer Science bachelor student, so I think that a more algorithmic approach on the revolving door ordering can help me, don't get me wrong, I can handle the mathematics, but by now, I am not able to visualize the strong mathematical approaches as clear algorithms.

Best Answer

Judging from your description, you're seeking (weakly) connected subgraphs. In this case, the fastest algorithm I've encountered for complete enumeration is the ESU algorithm (ref.), which was developed by Sebastian Wernicke for his network motif detection package called Fanmod. It visits each $k$-node subgraph exactly once.

If you're talking about graphs with billions of nodes (and consequently zillions of $k$-node subgraphs), full enumeration is unlikely to run in a reasonable amount of time, and you'll likely be stuck finding an estimation of the number of subgraphs.