[Math] Finding the longest path in an undirected weighted tree

algorithmsgraph theory

I have a tree where each edge is assigned a weight (a real number that can be positive or negative). I need an algorithm to find a simple path of maximum total weight (that is, a simple path where the sum of the weights of the edges in the path is maximum). There's no restriction on what node the path starts or ends.

I have a possible algorithm, but I am not sure it works and I am looking for a proof. Here it is:

  1. Select an arbitrary node u and run DFS(u) to find the maximum weight simple path that starts at u. Let (u, v) be this path.
  2. Run DFS(v) to find the maximum weight simple path that starts at v. Let this path be (v, z).

Then (v, z) is a simple path of maximum weight. This algorithm is linear in the size of the graph. Can anyone tell me if it works, and if so, give a proof?

Note: The Longest Path Problem is NP-Hard for a general graph with cycles. However, I only consider trees here.

Best Answer

Consider the following algorithm, where we start at terminal nodes and keep removing them while accounting for the corresponding paths. For each node $v$, we store $2$ longest paths we have obtained so far (say of length $l_1(v)$ and $l_2(v)$, with $l_1 \geq l_2$), which end at that node. Initially set all these values to zero.

  1. Let $v$ a terminal vertex in the tree and let $(u,v)$ be the edge touching it.
  2. Set $l = l_1(v) + w(u,v)$.
  3. Modify $l_1(u)$ and $l_2(u)$ by picking the largest two among $l_1(u)$, $l_2(u)$ and $l$.
  4. Remove the terminal node $v$ and its edge $(u,v)$ from the tree. If the tree is not empty (no edges), go back to step $1$.

When all edges have been removed, we can show that maximum path length is given by $\max {(l_1(v) + l_2(v))}$, over all nodes $v$ in the original tree. Each vertex and edge is visited exactly once and all the operations are constant time, so the algorithm is linear in the number of vertices.

I assumed here that an empty path (path with no edges) is admissible and has length $0$ (for example, when all edge weights are negative, empty path is the longest). But if you only want non-empty paths, you could change the initialization of $l_1$ and $l_2$ values for all nodes to $-\infty$ instead of $0$.