Minimum distance (threshold) for fully connected graph

graph theory

M is an m x m square, symmetrical matrix. It is the distance matrix between m random points in N-dimensional space.

I'd like to transform M into a fully connected graph by placing a threshold (t) on the distances. I would like (t) to be the smallest value which lends me a fully connected graph of M.

For i, j in M:

  • if distance i to j is < t: then i, j are connected

  • otherwise: i,j are not connected

I'd like to find the minimal value of t which would return a fully connected graph.

I'm unsure if there are any graph theoretical approach towards efficiently finding the minimal value of t which satisfies this condition. I've tried looking online with no luck.

If there are no theoretical frameworks, what would be the quickest / most efficient way to find t?

My initial idea was to do the following:

1) Compute the minimum distance in each row

2) Set t as the largest of the minimal distances obtained in (1)

3) Construct my graph and check if it's connected using t as a threshold

4) Based on whether or not the resulting graph is connected, add (or remove) nodes, until the graph becomes fully connected (or stop being fully connected). That would be my final t.

I'm unsure if this is the most efficient way to search for this threshold.

Since my initial matrix is random, there should be a quick/mathematical way to determine an optimal (t) without having to try a bunch of values pseudo-randomly… right?

Maybe based on the initial number of points, number of dimensions, and the distance function?

I'm not sure how to approach this problem. Any ideas or leads on papers which could help?

Best Answer

You should look for the minimum spanning tree of the graph (using the distances between nodes as edge weights), which is commonly found using Prim's algorithm or Kruskal's algorithm. Although the minimum spanning tree problem is formulated as minimizing the sum of the edge weights, it can be shown that the minimum spanning tree also minimizes the maximum of the edge weights.

To see this, let $e$ be the edge with the largest weight in a minimum spanning tree $T$. Then $T-e$ has two components (call them $A$ and $B$), and any edge between $A$ and $B$ in the graph can be used to connect them. If there is some edge $e'$ with weight less than $e$, then replacing $e$ by $e'$ creates a new spanning tree with smaller total weight; so no such $e'$ exists. Therefore $e$ is the least-weight edge connecting $A$ to $B$, and since you need some edge connecting $A$ to $B$, every spanning tree has an edge with weight at least the weight of $e$.

In your case, the edge weights are distances, and once you find a minimum spanning tree, the weight of the largest edge in it will be the distance threshold $t$ you're looking for. Including all the edges with distances at or below $t$ will probably include many edges other than the ones in your tree, but in particular, you'll get the entire tree: so the graph will be connected.

To see that there's no way to do it more cheaply, note that any connected graph contains a spanning tree, and (by the minimality property we proved) every spanning tree contains an edge of length (weight) at least $t$.

Related Question