Optimal tree structure within an undirected weighted graph

combinatoricsdiscrete optimizationgraph theory

I am looking for an algorithm to split a graph into subgraphs to optimize drawing of the graph.

The initial graph is a weighted bipartite graph. Let's call the nodes on one side elements and on the other busses. I want to cut the edges on the busses so all busses have degree two while minimizing the weight of the cut edges. This would result in a graph that can be drawn as a forest with the elements being the nodes and the busses being the edges. Not all elements have to be in the final tree, although the number of elements in the tree must be maximized.

An example structure is shown here:
Example image

The best algorithm I have found so far is a longest path algorithm. Although it is quite easy to create a graph where a tree structure is more efficient than simply using a longest graph.
Next I tried an algorithm where the longest path is combined with a tree splitting, so all elements where multiple busses originate that are not part of the longest path would have a new longest path originating there with the remaining nodes. But this also was not optimal as it would not provide a good solution with cyclic graphs.

Update – Brute force solution

I brute forced the graph in the image above to find the solution. The python code used to force this solution is:

#%% Imports
import itertools
import numpy as np
import typing

#%% Construct the graph
nodes: typing.List[str] = [str(i + 1) for i in range(7)] + [chr(i) for i in range(65, 70)]
bus_nodes: typing.List[str] = [chr(i) for i in range(65, 70)]
edges: typing.Dict[typing.Tuple[str, str], int] = {
    ("1", "A"): 1,
    ("1", "C"): 3,
    ("2", "A"): 4,
    ("2", "B"): 2,
    ("3", "A"): 7,
    ("3", "B"): 5,
    ("3", "C"): 6,
    ("4", "B"): 7,
    ("4", "D"): 3,
    ("5", "A"): 6,
    ("5", "C"): 1,
    ("5", "D"): 1,
    ("5", "E"): 9,
    ("6", "A"): 2,
    ("7", "B"): 8,
    ("7", "E"): 3
}

#%% Convert the nodes into an adjacency matrix.
N: int = len(nodes)
adj: np.ndarray = np.zeros(shape=(N, N), dtype=int)
for edge, weight in edges.items():
    i_a: int = nodes.index(edge[0])
    i_b: int = nodes.index(edge[1])
    ima = max(i_a, i_b)
    imi = min(i_a, i_b)
    adj[ima, imi] = weight

#%% Get each bus vertex that must be cut and get the options.
bus_nodes_to_options: typing.Dict[str, typing.Tuple[int, int]] = dict()
for node in bus_nodes:
    index = nodes.index(node)
    options = list(np.squeeze(np.argwhere(adj[index,:] > 0)))
    if len(options) == 2:
        continue
    bus_nodes_to_options[node] = list(itertools.combinations(options, r=2))

#%% Implement Kruskal algorithm
def kruskal_islands(adj: np.ndarray) -> typing.List[typing.Set[int]]:
    # Get the number of connected sets based on krugals algorithm.
    N: int = adj.shape[0]
    sets: typing.List[typing.Set[int]] = [{i, } for i in range(N)]
    for i,j in [(i, j) for i in range(N) for j in range(i)]:
        if adj[i,j] == 0:
            continue
        set_i = next(s for s in sets if i in s)
        set_j = next(s for s in sets if j in s)
        if set_i == set_j:
            continue
        set_i.update(set_j)
        sets.remove(set_j)
    return sets

#%% Create the graph for each options and calculate it's value
result: typing.List[typing.Tuple[int, int]] = list()
options = list(itertools.product(*bus_nodes_to_options.values()))
o = len(bus_nodes)
for q, option in enumerate(options):
    adj_copy = np.copy(adj)
    for i, row in enumerate(option):
        index = nodes.index(bus_nodes[i])
        adj_row = np.zeros(N)
        adj_row[row[0]] = adj[index, row[0]]
        adj_row[row[1]] = adj[index, row[1]]
        adj_copy[index, :] = adj_row
    total_weight = np.sum(adj_copy[-o:-1, 0:-o])
    num_islands = len(kruskal_islands(adj_copy))
    result.append((total_weight, num_islands))

#%% Determine the best option.
min_islands = min([a[1] for a in result])
max_rows = max([a[0] for a in result if a[1] == min_islands])
opt = result.index((max_rows, min_islands))
print(max_rows)
print(opt)
print(options[opt])

This resulted in the following graph:
Optimized example result.

All lines that are red are cut from the actual solution.

Although this brute force tactic works, it is also extremely slow as I have to do similar operations continuously. Is there a more elegant solution to my problem?

Update – Partial solution leading to restructuring

With the partial solution provided by @connor harris the problem can be rewritten. Now we can choose edge from all busses that must surely be part of the bus. All busses that only have two connection are already predefined. With these two we can group the remaining elements into different section as shown in this image: enter image description here

Additionally, the connection between B-4 can be removed as it will connect between two nodes that are already part of the same group and thus would result in a loop in the final tree which is not allowed.

With these there are 5 groups remaining that are connected. From each bus we still need to chose the second connection that must be optimized. Different greedy algorithms don't work. For example in the restructured image (which is the same as the one above) here: enter image description here

A greedy algorithm would start with the connection from A to yellow. Although this connection is not part of the optimal solution as the brute force algorithm shows.

Best Answer

Only a partial result for now, but this should provide a considerable advance over brute force.

Lemma. The optimal forest includes, for every bus $b \in B$, the edge to $b$ with maximum weight.

Proof. Let $F$ be some forest structure on our original graph, and let $E_1, \ldots, E_k$ be a division of the elements $E$ into connected components of $F$ minus all edges to the bus $b$: that is, if $e_1, e_2$ are in different connected components $E_i$, then either there is no path from $e_1$ to $e_2$ through $F$, or the (necessarily unique) path passes through $b$. This means that we can modify $F$ by changing the two elements $e_1, e_2$ that connect to $b$, and as long as $e_1$ and $e_2$ are in different connected components, $F$ remains acyclic. The optimum choice is thus given by connecting $b$ to the two maximum-weight elements that also have the maximum weight in their connected components: that is, choose $e_1 \in E$ such that $w(e_1, b)$ is maximal, and then choose $e_2 \in E$ such that $w(e_2, b)$ is maximal among all elements that are not in the connected component $E_i$ that contains $e_1$.

Remarks: 1) If $b$ has multiple edges with the same maximum weight, then we could choose any edge. 2) This lemma applies even if we restrict $F$ to be a tree; in that case, there are exactly two connected components $E_1, E_2$.


This result lets us choose, in linear time, a subgraph of the optimal tree (or forest) consisting of a number of "complexes", each containing one element with connections to zero or more buses. Our task now is to find a "maximum spanning tree" on these complexes. by connecting a bus from one complex to the sole element of the other. Unfortunately, the usual MST algorithms don't work for this, because the connection weight between two complexes isn't independent from how they're connected to other complexes: it's possible that the highest-weight connections from $e_1$ to $e_2$ and $e_3$, for instance, both run through the same bus $b$, and we can't use both connections. So greedy algorithms that expand maximum spanning trees by choosing the maximum-weight edge at each step won't work.

(For instance: suppose we have three elements $1, 2, 3$ and two buses $A, B$ with connection weights $w(1, A) = w(2, B) = 20$, $w(2, A) = 10$, $w(1, B) = 9$, $w(3, A) = 9$, $w(3, B) = 1$. The maximum tree has to include edges $w(1, A)$ and $w(2, B)$, for weight 40. If we follow Kruskal's algorithm and take the greedy choice $(2, A)$ for our next edge, then we have to connect to $3$ via $B$, giving a total weight of $51$, whereas using $B$ for the connection between $1$ and $2$ in order to preserve $A$ for a better connection to $3$ would give us a total weight of $58$.)

Related Question