[Math] A constrained topological sort

algorithmsdirected graphsdiscrete mathematicsgraph theory

Suppose that one has a directed, acyclic graph G, and each vertex $v$ contains a (positive) value $a_v$. Additionally, let $r$ be a constant. For my purposes, $r>1$, but this might not matter. Let $n$ be the number of vertices in G and let $[n]:=\{1,2,\ldots, n\}$.

A topological sort of $G$ is a bijection $G\to[n]$ such that if there is a path from $v$ to $w$, then $\tau(v)<\tau(w)$. Alternately, if we view $G$ as defining a partial ordering, a topological sort is a total ordering extending the partial ordering.

I would like to find $\displaystyle\min_{\tau} \sum_v a_v r^{\tau (v)}$, where we are taking the minimum over all topological sorts of G. It may help to generalize the problem and to finding $\displaystyle\min_{\tau} \sum_v a_v p({\tau (v)})$, where $p:[n]\to \mathbb R$ is a penalty/weight function (perhaps assumed to take positive values and be monotonic).

There are two extreme cases. If $G$ has no edges, we would sort things in ascending or descending order depending on if $r<1$ or $r>1$. If $G$ is already a linear order, there is nothing to be done. Already, the problem seems nontrivial given two disjoint linear orderings, where the problem reduces to the optimal riffle shuffle.

So is there a good algorithm for solving this? I know of some heuristics which help in certain cases, and I can use a bubble-sort type algorithm to get "local" minima, but unless there is a way to recast the problem, I don't see a good way to solve it.

Added later: I want to extend my comment and explain why I view dynamic programming to be insufficient. At best, this will clarify what I'm looking for. At worst, this will reveal a gap in my understanding which someone can clarify.

For there to be a dynamic programming solution, there need to be sub-problems which can be built upon to get a larger solution. For example, when searching for a path through a graph with edge lengths, if a minimal length path passes through a particular vertex, then the path from from the start to that vertex must be of minimal length. If we keep track of all the vertices that can be reached in time less than $t$, then we can ignore all paths through those close vertices which do not begin with a minimal path, and so we need to remember at most one path to any vertex, and at every stage we only need to find the shortest path to an unvisited vertex which is an extension of a known minimal path. This gives $O(n^2)$ storage costs and $O(nm)$ time costs where $n$ is the number of vertices and $m$ is the number of edges.

The obvious sub-problem to use for the problem at hand is that, if we know an initial/terminal segment for an optimal solution, the restriction to the subgraph containing just those elements will yield the same initial/terminal segment. It does not appear that we can say anything stronger. The algorithm this yields is as follows:

Select all vertices with no predecessors, and put each of these singletons into a list of admissible initial segments.
(Definition) For an admissible initial segment of length $n$, we say that an extension of length $n+1$ is admissible if it satisfies both the topological constraints of the directed graph and, if no topologically allowed insertion of the new element has a lower total value. From the collection of all admissible extensions of length $n+1$.
Given the collection of all admissible segments of length $n$, form the collection of all admissible extensions of length $n+1$.
From the collection generated in (3), if any two admissible extensions use exactly the same collection of vertices, remove the segment with a higher associated cost.
Loop through (3) and (4) until you have found the minimal initial segment containing every vertex of the graph.

If the penalty function is monotonic increasing (e.g., if $r>1$), we can improve run time somewhat by adding a heuristic (total cost of adding everything else at the minimum possible distance, ignoring that only one item can be in any particular spot), but even with this improvement, we have the following fundamental problem:

The algorithm doesn't require checking every initial segment, but it does require examining every collection of vertices which could form an allowable initial segment. In the worst case scenario, this is exponential in the number of vertices (though is significantly reduced when there are severe topological constraints). Additionally, in the worst case scenario, the space requirements are on the order of $\binom{n}{n/2}$ where there are $n$ vertices.

The dynamic programming algorithm is still quite an improvement over more naive algorithms, but I would like to find something that runs in polynomial time, or else show that such an algorithm cannot exist.

Best Answer

Some cases of your problem are solvable by greedy approach. For example $r \ge n$ and $\{\,a_1, a_2, \ldots, a_n\,\} = \{\,1, 2, \ldots, n\,\}$. In this case penalty function tells that the last vertex in our ordering should have the smallest possible value and no matter how all other vertices are placed. Topological order says that it also should have no outgoing arcs. So we should place there a vertex with smallest value among all vertices without outgoing arcs and solve a subproblem. Using heap we can get $O(n \log n + m)$ time for $n$ vertices and $m$ arcs.

In general case the problem seems to be NP-hard. However I failed to prove this fact.

Related Solutions

[Math] Difficulty in understanding topological sort

If you do a depth first search (in the directed graph), and record the finish time $f[u]$ of each vertex $u$, then the topological sorted list is the list of vertices, sorted in the order of descending $f[u]$. Recall that if the graph is acyclic and if there is a path from $u$ to $v$, then $f[u] \gt f[v]$ (children finish before their parents).

Now if you form a graph with vertices as the children and a directed edge from $i$ to $j$ iff $j$ hates $i$. If there is a cycle in the graph, the ordering is not possible, otherwise the graph is a directed acyclic graph (DAG in short) and an ordering is possible.

Say you do a DFS (starting from a dummy node which has a directed edge to all vertices) and record the finish times. Now given two vertices $i$ and $j$, if $i$ hates $j$, then you will have that $f(j) \gt f(i)$, and your topologically sorted list will place $j$ before $i$.

Why topological ordering helps speeding shortest path finding

Given a directed graph $G = (V, E)$ where $E \subseteq V^2$ and both $V$ and $E$ are finite, a topological sort of $G$ is a total order $\leq$ on $V$ such that if $v \leq u$ then $(v, u) \notin E$.

Typically, a topological sort of a graph is represented by a sequence of nodes $v_1, v_2, ..., v_n$, where $n = |V|$ and $\{v_1, ..., v_n\} = V$. The order is then that $v_i \leq v_j$ iff $i \leq j$.

The key constraint on a topological sort is that a topological sort exists if and only if the graph is acyclic. This means there cannot be a cycle of edges $v_1 \to v_2 \to ... \to v_n \to v_1$ in a graph with a topological sort.

Topological sorting can allow faster determination of the shortest path between two nodes. To find the shortest path between nodes r and q in the weighted graph (V, E, weight), we execute the following algorithm.

Let v_1, v_2, ..., v_n be a topological sort of (V, E, weight);
For each i from 1 to n:
   If v_i = r:
      Let dist_i = 0;
   Else:
      Let dist_i = the minimum of dist_k + w, 
          taken over all edges of the form (v_k, v_i) with weight w;
   If v_i = q:
      return dist_i;

Here is a brief proof that the algorithm is correct.

I claim that after the $i$th iteration of the for-loop of the form For each i from 1 to n, for all $j$ such that $j \leq i$, the value dist_j is the length of the shortest path from r to v_j. If there is no shortest path, we say the length of the shortest path is infinity.

We proceed by strong induction on i.

Case i = 0: This holds vacuously.

Case i = c + 1: In this case, we see that the value of dist_j is correct for all j <= c; that is, for all j < i.

Any path from r to v_i which contains at least one edge must end with some edge of the form (v_k, v_i) with weight w. So a path with at least one edge from r to v_i consists of a path from r to v_k together with an edge v_k to v_i. So the minimal weight of a path from r to v_i with at least one edge will be

$$ \min\limits_{(v_k, v_i) \text{ an edge with weight } w, p \text{ a path from } r \text{ to } v_k} w + length(p) = \min\limits_{(v_k, v_i) \text{ an edge with weight } w} [w + \min\limits_{p \text{ a path from } r \text{ to } v_k} length(p)]$$

Now fix some edge $(v_k, v_i)$ with weight $w$. We know it cannot be the case that $i \leq k$; therefore, $k < i$. Then we know going into this iteration that

$$dist_k = \min\limits_{p \text{ a path from } r \text{ to } v_k} length(p)$$

And so the minimum we're looking for is

$$\min\limits_{(v_k, v_i) \text{ an edge with weight } w} w + dist_k$$

Which is exactly what our code does.

In the case where $v_i = r$, we see that because the graph is acyclic, the only path from $v_i$ to itself is the empty path with weight $0$. This is why we include this clause.

Thus, we have proved the correctness of the algorithm.

Note that the critical part of the proof was observing that we had already correctly computed $dist_k$ for all $k$ such that $(v_k, v_i)$ is an edge by the time we get around to computing the value of $dist_i$. This is absolutely essential for the correctness of the algorithm. If we don't have a topological sort, then we don't get this guarantee of correctness.

Why is this algorithm O(E + V)? Let's consider that there is an implicit inner loop running when we compute $dist_i$ for $v_i \neq r$. In this implicit inner loop, we iterate over all edges ending in node $v_i$. Now each edge ends with exactly 1 node. Thus, we will iterate over each given edge at most 1 time. This is why the inner loop will execute O(E) times in total across all executions of the outer loop. And thus the whole thing will be O(E + V).

Best Answer

Related Solutions

[Math] Difficulty in understanding topological sort

Why topological ordering helps speeding shortest path finding

Related Question