[Math] A constrained topological sort

algorithmsdirected graphsdiscrete mathematicsgraph theory

Suppose that one has a directed, acyclic graph G, and each vertex $v$ contains a (positive) value $a_v$. Additionally, let $r$ be a constant. For my purposes, $r>1$, but this might not matter. Let $n$ be the number of vertices in G and let $[n]:=\{1,2,\ldots, n\}$.

A topological sort of $G$ is a bijection $G\to[n]$ such that if there is a path from $v$ to $w$, then $\tau(v)<\tau(w)$. Alternately, if we view $G$ as defining a partial ordering, a topological sort is a total ordering extending the partial ordering.

I would like to find $\displaystyle\min_{\tau} \sum_v a_v r^{\tau (v)}$, where we are taking the minimum over all topological sorts of G. It may help to generalize the problem and to finding $\displaystyle\min_{\tau} \sum_v a_v p({\tau (v)})$, where $p:[n]\to \mathbb R$ is a penalty/weight function (perhaps assumed to take positive values and be monotonic).

There are two extreme cases. If $G$ has no edges, we would sort things in ascending or descending order depending on if $r<1$ or $r>1$. If $G$ is already a linear order, there is nothing to be done. Already, the problem seems nontrivial given two disjoint linear orderings, where the problem reduces to the optimal riffle shuffle.

So is there a good algorithm for solving this? I know of some heuristics which help in certain cases, and I can use a bubble-sort type algorithm to get "local" minima, but unless there is a way to recast the problem, I don't see a good way to solve it.


Added later: I want to extend my comment and explain why I view dynamic programming to be insufficient. At best, this will clarify what I'm looking for. At worst, this will reveal a gap in my understanding which someone can clarify.

For there to be a dynamic programming solution, there need to be sub-problems which can be built upon to get a larger solution. For example, when searching for a path through a graph with edge lengths, if a minimal length path passes through a particular vertex, then the path from from the start to that vertex must be of minimal length. If we keep track of all the vertices that can be reached in time less than $t$, then we can ignore all paths through those close vertices which do not begin with a minimal path, and so we need to remember at most one path to any vertex, and at every stage we only need to find the shortest path to an unvisited vertex which is an extension of a known minimal path. This gives $O(n^2)$ storage costs and $O(nm)$ time costs where $n$ is the number of vertices and $m$ is the number of edges.

The obvious sub-problem to use for the problem at hand is that, if we know an initial/terminal segment for an optimal solution, the restriction to the subgraph containing just those elements will yield the same initial/terminal segment. It does not appear that we can say anything stronger. The algorithm this yields is as follows:

  1. Select all vertices with no predecessors, and put each of these singletons into a list of admissible initial segments.
  2. (Definition) For an admissible initial segment of length $n$, we say that an extension of length $n+1$ is admissible if it satisfies both the topological constraints of the directed graph and, if no topologically allowed insertion of the new element has a lower total value. From the collection of all admissible extensions of length $n+1$.
  3. Given the collection of all admissible segments of length $n$, form the collection of all admissible extensions of length $n+1$.
  4. From the collection generated in (3), if any two admissible extensions use exactly the same collection of vertices, remove the segment with a higher associated cost.
  5. Loop through (3) and (4) until you have found the minimal initial segment containing every vertex of the graph.

If the penalty function is monotonic increasing (e.g., if $r>1$), we can improve run time somewhat by adding a heuristic (total cost of adding everything else at the minimum possible distance, ignoring that only one item can be in any particular spot), but even with this improvement, we have the following fundamental problem:

The algorithm doesn't require checking every initial segment, but it does require examining every collection of vertices which could form an allowable initial segment. In the worst case scenario, this is exponential in the number of vertices (though is significantly reduced when there are severe topological constraints). Additionally, in the worst case scenario, the space requirements are on the order of $\binom{n}{n/2}$ where there are $n$ vertices.

The dynamic programming algorithm is still quite an improvement over more naive algorithms, but I would like to find something that runs in polynomial time, or else show that such an algorithm cannot exist.

Best Answer

Some cases of your problem are solvable by greedy approach. For example $r \ge n$ and $\{\,a_1, a_2, \ldots, a_n\,\} = \{\,1, 2, \ldots, n\,\}$. In this case penalty function tells that the last vertex in our ordering should have the smallest possible value and no matter how all other vertices are placed. Topological order says that it also should have no outgoing arcs. So we should place there a vertex with smallest value among all vertices without outgoing arcs and solve a subproblem. Using heap we can get $O(n \log n + m)$ time for $n$ vertices and $m$ arcs.

In general case the problem seems to be NP-hard. However I failed to prove this fact.

Related Question