Can someone explain how the resulting graph is the minimum spanning tree (MST) from the fully-connected undirected graph? I don't understand how this is interpreted in this context.
Definition according to Wikipedia:
A minimum spanning tree (MST) or minimum weight spanning tree is a subset of the edges of a connected, edge-weighted undirected graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight.
# Toy Data (Iris dataset, pearson correlation, absolute value)
d = {'sepal_length': {'sepal_length': 1.0, 'sepal_width': 0.11756978413300088, 'petal_length': 0.8717537758865838, 'petal_width': 0.8179411262715758}, 'sepal_width': {'sepal_length': 0.11756978413300088, 'sepal_width': 1.0, 'petal_length': 0.42844010433053864, 'petal_width': 0.3661259325364377}, 'petal_length': {'sepal_length': 0.8717537758865838, 'sepal_width': 0.42844010433053864, 'petal_length': 1.0, 'petal_width': 0.962865431402796}, 'petal_width': {'sepal_length': 0.8179411262715758, 'sepal_width': 0.3661259325364377, 'petal_length': 0.962865431402796, 'petal_width': 1.0}}
g = nx.from_pandas_adjacency(pd.DataFrame(d))
# Remove self-loops
g.remove_edges_from(nx.selfloop_edges(g))
# Graph
print(g.edges(data=True))
# [('sepal_length', 'sepal_width', {'weight': 0.11756978413300088}), ('sepal_length', 'petal_length', {'weight': 0.8717537758865838}), ('sepal_length', 'petal_width', {'weight': 0.8179411262715758}), ('sepal_width', 'petal_length', {'weight': 0.42844010433053864}), ('sepal_width', 'petal_width', {'weight': 0.3661259325364377}), ('petal_length', 'petal_width', {'weight': 0.962865431402796})]
# MST
g_mst = nx.minimum_spanning_tree(g)
print(g_mst.edges(data=True))
# [('sepal_length', 'sepal_width', {'weight': 0.11756978413300088}), ('sepal_width', 'petal_width', {'weight': 0.3661259325364377}), ('sepal_width', 'petal_length', {'weight': 0.42844010433053864})]
Best Answer
In this graph, the minimum spanning tree will have three edges (to connect to all vertices without loops). A tree with four edges will not be possible, because it would lead to a loop. A tree with two edges will also not be possible, because it would not connect to all vertices.
Therefore, to find the MST, you have to compare the total weight of all trees with three edges and find the minimum.
Let's look at all possiblities (with rounded numbers):
sepal_width -- petal_length (0.4284) , sepal_width -- petal_width (0.3661) , sepal_width -- sepal_length (0.1176) = 0.9121
sepal_width -- petal_width (0.3661) , petal_width -- petal_length (0.9629) , petal_width -- sepal_length (0.8179) = 2.1469
petal_width -- sepal_length (0.8179), sepal_width -- sepal_length (0.1176) , petal_length -- sepal_length (0.8718) = 1.8073
petal_width -- petal_length (0.9629) , sepal_width -- petal_length (0.4284), petal_length -- sepal_length (0.8718) = 2.2631
sepal_width -- petal_length (0.4284) , sepal_width -- petal_width (0.3661) , petal_width -- sepal_length (0.8179) = 1.6124
sepal_width -- petal_width (0.3661) , petal_width -- sepal_length (0.8179) , petal_length -- sepal_length (0.8718) = 2.0558
petal_width -- sepal_length (0.8179) , petal_length -- sepal_length (0.8718) , sepal_width -- petal_length (0.4284) = 2.1181
petal_length -- sepal_length (0.8718) , sepal_width -- petal_length (0.4284) , sepal_width -- petal_width (0.3661) = 1.6663
petal_length -- sepal_width (0.4284), sepal_width -- sepal_length (0.1176), sepal_length -- petal_width (0.8179) = 1.3639
petal_length -- sepal_width (0.4284), sepal_width -- sepal_length (0.1176), petal_length -- petal_width (0.9629) = 1.5089
petal_width -- sepal_length (0.8179), sepal_width -- sepal_length (0.1176), petal_length -- petal_width (0.9629) = 1.8984
petal_length -- sepal_length (0.8718), sepal_width -- sepal_length (0.1176), petal_length -- petal_width (0.9629) = 1.9523
sepal_width -- petal_width (0.3661), sepal_width -- sepal_length (0.1176), petal_length -- petal_width (0.9629) = 1.4466
sepal_width -- petal_width (0.3661), petal_length -- sepal_length (0.8718), petal_length -- petal_width (0.9629) = 2.2008
sepal_width -- petal_width (0.3661), petal_length -- sepal_length (0.8718), sepal_width -- sepal_length (0.1176) = 1.3555
sepal_width -- petal_length (0.4284), petal_length -- petal_width (0.9629), sepal_length -- petal_width (0.8179) = 2.2092
The smallest weight of
0.9121
is achieved for the first combination. I.e., this is the tree with the smallest sum over its edges connecting to all vertices.