Interpret the minimum spanning tree in a fully-connected graph

data visualizationgraph theorynetworks

Can someone explain how the resulting graph is the minimum spanning tree (MST) from the fully-connected undirected graph? I don't understand how this is interpreted in this context.

Definition according to Wikipedia:

A minimum spanning tree (MST) or minimum weight spanning tree is a subset of the edges of a connected, edge-weighted undirected graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight.

# Toy Data (Iris dataset, pearson correlation, absolute value)
d = {'sepal_length': {'sepal_length': 1.0, 'sepal_width': 0.11756978413300088, 'petal_length': 0.8717537758865838, 'petal_width': 0.8179411262715758}, 'sepal_width': {'sepal_length': 0.11756978413300088, 'sepal_width': 1.0, 'petal_length': 0.42844010433053864, 'petal_width': 0.3661259325364377}, 'petal_length': {'sepal_length': 0.8717537758865838, 'sepal_width': 0.42844010433053864, 'petal_length': 1.0, 'petal_width': 0.962865431402796}, 'petal_width': {'sepal_length': 0.8179411262715758, 'sepal_width': 0.3661259325364377, 'petal_length': 0.962865431402796, 'petal_width': 1.0}}
g = nx.from_pandas_adjacency(pd.DataFrame(d))

# Remove self-loops
g.remove_edges_from(nx.selfloop_edges(g))

# Graph
print(g.edges(data=True))
# [('sepal_length', 'sepal_width', {'weight': 0.11756978413300088}), ('sepal_length', 'petal_length', {'weight': 0.8717537758865838}), ('sepal_length', 'petal_width', {'weight': 0.8179411262715758}), ('sepal_width', 'petal_length', {'weight': 0.42844010433053864}), ('sepal_width', 'petal_width', {'weight': 0.3661259325364377}), ('petal_length', 'petal_width', {'weight': 0.962865431402796})]

enter image description here

# MST
g_mst = nx.minimum_spanning_tree(g)
print(g_mst.edges(data=True))
# [('sepal_length', 'sepal_width', {'weight': 0.11756978413300088}), ('sepal_width', 'petal_width', {'weight': 0.3661259325364377}), ('sepal_width', 'petal_length', {'weight': 0.42844010433053864})]

enter image description here

Best Answer

In this graph, the minimum spanning tree will have three edges (to connect to all vertices without loops). A tree with four edges will not be possible, because it would lead to a loop. A tree with two edges will also not be possible, because it would not connect to all vertices.

Therefore, to find the MST, you have to compare the total weight of all trees with three edges and find the minimum.

enter image description here

Let's look at all possiblities (with rounded numbers):

  • sepal_width -- petal_length (0.4284) , sepal_width -- petal_width (0.3661) , sepal_width -- sepal_length (0.1176) = 0.9121

  • sepal_width -- petal_width (0.3661) , petal_width -- petal_length (0.9629) , petal_width -- sepal_length (0.8179) = 2.1469

  • petal_width -- sepal_length (0.8179), sepal_width -- sepal_length (0.1176) , petal_length -- sepal_length (0.8718) = 1.8073

  • petal_width -- petal_length (0.9629) , sepal_width -- petal_length (0.4284), petal_length -- sepal_length (0.8718) = 2.2631

  • sepal_width -- petal_length (0.4284) , sepal_width -- petal_width (0.3661) , petal_width -- sepal_length (0.8179) = 1.6124

  • sepal_width -- petal_width (0.3661) , petal_width -- sepal_length (0.8179) , petal_length -- sepal_length (0.8718) = 2.0558

  • petal_width -- sepal_length (0.8179) , petal_length -- sepal_length (0.8718) , sepal_width -- petal_length (0.4284) = 2.1181

  • petal_length -- sepal_length (0.8718) , sepal_width -- petal_length (0.4284) , sepal_width -- petal_width (0.3661) = 1.6663

  • petal_length -- sepal_width (0.4284), sepal_width -- sepal_length (0.1176), sepal_length -- petal_width (0.8179) = 1.3639

  • petal_length -- sepal_width (0.4284), sepal_width -- sepal_length (0.1176), petal_length -- petal_width (0.9629) = 1.5089

  • petal_width -- sepal_length (0.8179), sepal_width -- sepal_length (0.1176), petal_length -- petal_width (0.9629) = 1.8984

  • petal_length -- sepal_length (0.8718), sepal_width -- sepal_length (0.1176), petal_length -- petal_width (0.9629) = 1.9523

  • sepal_width -- petal_width (0.3661), sepal_width -- sepal_length (0.1176), petal_length -- petal_width (0.9629) = 1.4466

  • sepal_width -- petal_width (0.3661), petal_length -- sepal_length (0.8718), petal_length -- petal_width (0.9629) = 2.2008

  • sepal_width -- petal_width (0.3661), petal_length -- sepal_length (0.8718), sepal_width -- sepal_length (0.1176) = 1.3555

  • sepal_width -- petal_length (0.4284), petal_length -- petal_width (0.9629), sepal_length -- petal_width (0.8179) = 2.2092

The smallest weight of 0.9121 is achieved for the first combination. I.e., this is the tree with the smallest sum over its edges connecting to all vertices.

Related Question