[EDIT]This is now a complete answer
A counter exemple for your solution is:
$v_1 \xrightarrow{2} v_2$ ;
$v_1 \xrightarrow{1} v_3$ ;
$v_2 \xrightarrow{0} v_3$ ;
$v_2 \xrightarrow{1} v_4$ ;
$v_3 \xrightarrow{2} v_4$ .
The first maximal path will give you this path $v_1\xrightarrow{2} v_2 \xrightarrow{0} v_3\xrightarrow{2} v_4$ of weight 4. And the second maximal has weight 1 (hence for a total of 5). But you can get a better total by taking $v_1 \xrightarrow{2} v_2 \xrightarrow{1} v_4$ and $v_1 \xrightarrow{1} v_3 \xrightarrow{2} v_4$ for a total of 6.
A solution
Let $G=(V,E,w)$ be a DAG.
Let $m=m_1...m_n$ be a path of maximum in $G$.
We define $G_m=(V',E',w')$ as follow:
- $V'=\{m_1,...,m_n\}\times V$
- $E'=\{((m_i,v),(m_i,v'))\mid (v,v')\in E\}\cup \{((m_i,m_j),(m_j,m_k))\mid i<k<j\}$
- $w'((m_i,v),(m_i,v'))=w(v,v')$ if $(v,v')$ do no belong to the maximal path, $0$ otherwise
- $w'((m_i,m_j),(m_j,m_k))=-w(m_k...m_j)$
Let $m'$ be the maximal path in $G_m$, the maximal sum you are looking for is equal to $w(m)+w'(m')$!
You can compute it in polynomial time since finding a maximal path is linear in DAGs and the second DAG has a size polynomial in the first.
I let you the joy of proving this solution correct ...
The intuition behind it is:
- I find the maximal path in $G$
- But it may be the case that I made a wrong choice maximizing the first path without caring about the second
- So I allow the second path to correct the first by going up the first path (taking negative reward). This is done when there is a change of copies (going from $m_i\times V$ to $m_j\times V$
- I can correct only once each edge of the graph (this is the reason of the inequality $i<k<j$).
Please ask if something is not clear or if I made a mistake. I didn't do the full proof. But I hope it is correct.
Edit 2: on an exemple
Considere the graph given as a counter example for your solution.
The maximal path is : $m=v_1 \to v_2 \to v_3 \to v_4$. thats gives 4.
$G_m$ is has follow : for $k\in\{1,2,3,4\}$ :
$(v_k,v_1) \xrightarrow{0} (v_k,v_2)$ ;
$(v_k,v_1) \xrightarrow{1} (v_k,v_3)$ ;
$(v_k,v_2) \xrightarrow{0} (v_k,v_3)$ ;
$(v_k,v_2) \xrightarrow{1} (v_k,v_4)$ ;
$(v_k,v_3) \xrightarrow{0} (v_k,v_4)$ ;
$(v_1,v_3)\xrightarrow{-0}(v_3,v_2)$ ;
$(v_1,v_4)\xrightarrow{-(0+2)}(v_4,v_2)$ ;
$(v_1,v_4)\xrightarrow{-2}(v_4,v_3)$ ;
.
The maximal path is $(v_1,v_1)\xrightarrow{1} (v_1,v_3)\xrightarrow{-0}(v_3,v_2)\xrightarrow{1}(v_3,v_4)$ that gives you 2.
Thus the solution is 6.
Notice that here the second maximal path somehow "correct" the first maximal path by taking $(v_1,v_3)\xrightarrow{-0}(v_3,v_2)$.
The graph you describe is in fact a Markov chain. What you are looking for is the expected value of the path for $St$ to $E1$. Notice that there is no notion of maximum value, since there are no choice to make, all the choices are made by the probability on the edges
In Markov chain, you can actually compute this value as the solution of the following equations:
Let $Trap(E1)$ be the set of nodes from which $E1$ is not accessible
For any node $n$ define the variable $x_n$ that represent the expect value of the path from $n$ to $E_1$.
If $n\in Trap(E)$ then add the equation $x_n=0$.
More over $x_{E1}=0$
For all node $n\in N\setminus Trap(E1)$ add the equation:
$$ x_n = \sum_{n'\in N\setminus Trap(E1)} p(n,n')(w(n,n')+x_{n'})$$
where $p(n,n')$ denotes the probability to take an edge form $n$ to $n'$ and $w(n,n')$ the weight of this edge.
You have $|N|$ variables and $|N|$ equations thus an unique solution.
the value you are looking for is $x_{St}$
On your example it gives:
$$x_{St}=0.8(1+x_{z_1})+0.2(x_{z_3}+1)$$
$$x_{z_3}=0.6(1+x_{z_2})$$
$$x_{z_2}=0.1(3+x_{St})+0.5(x_{z_4}+3)+0.1(x_{E1}+3)$$
$$x_{Z_1}=0.75(1+x_{z_2})+0.5(x_{z_4}+1)$$
$$x_{z_4}=4+x_{E_1}$$
And I let you solve it by elimination of variables for example.
Best Answer
We have to prove two things: (1) the algorithm gives us a valid spanning tree. (2) the resulting sum of edgecosts is minimal over all possible spanning trees.
Let $T$ be the collection of edges returned by the algorithm:
For (1): It is clear that the algorithm returns exactly $n-1$ edges where $n$ is the number of nodes. Now let $T_u$ be the undirected set of edges arising from $T$. Then $(V, T_u)$ cannot contain any cycles because of the following argument: Assume there is a cycle in $(V, T_u)$. Then there exist nodes $u, v$ such that there are two $u$-$v$-paths in $T$. But this would mean that $deg_T(v) > 1$ which is not possible given the definition of the algorithm. Since we have $n-1$ edges on $n$ nodes with no cycles it follows that $T_u$ is a spanning tree of $G$.
For (2): Note that this does not hold. I give a counterexample: Let $$V = \{A, B, C, D\}$$ $$E = \{ (A, B), (A, C), (B, D), (C, D)\}$$ $$c((A, B)) = 10, c((A, C)) = 10, c((B, D)) = 1, c((C, D)) = 1$$
The minimum spanning tree is given by either $T = \{ (A, B), (B, D), (C, D)\}$ or $T = \{ (A, C), (B, D), (C, D)\}$. But neither of those can be returned by your algorithm because the in-degree of $D$ is $2$ in both cases.