# An optimal non-bipartite matching (Python implementation)

algorithmsgraph theorymatching-theoryoptimizationpython

I have a set $$S = \{s_1, \cdots, s_n \}$$ ($$n$$ is even), and the $$n\times n$$ matrix $$M = \{ d_{ij}\}$$, where $$d_{ij}$$ is a distance between $$s_i$$ and $$s_j$$. I wish to find a non-bipartite matching on $$S$$ that minimises the sum of the distances between matched elements.

That is to say, I want to break the set $$S$$ into $$n/2$$ pairs $$\{P_k = (s_k^1, s_k^2) \}$$ so that each $$s_i$$ appears in exactly one pair, with exactly one other element $$s_j$$, $$i \neq j$$, and the pairs use up all of $$S$$. Further, the sum of distances within pairs is minimised.

I couldn't find a Python implementation of the problem, but I found some implementations of the assignment problem. I utilised these by finding bipartite matchings between $$S$$ and itself. To prevent trivial matchings $$s_i \leftrightarrow s_i$$, I set the diagonal of the matrix $$M$$ to a large number (say) $$2 \times max \{d_{ij}\}$$. For example, using

scipy.optimize.linear_sum_assignment(M, maximize=False)

or using $$-M$$ to solve the dual problem with:

scipy.sparse.csgraph.maximum_bipartite_matching(-M, row)

For small sets $$S$$, both of these give the correct desired non-bipartite matching. Each element $$s_i$$ matches with exactly one $$s_j$$ $$(i \neq j)$$, and the sum of distances is minimised. I have checked this by brute force. An example of correct matchings is:

[[0, 6], [1, 8], [2, 15], [3, 4], [4, 3], [5, 7], [6, 0], [7, 5], [8, 1], [9, 11], [10, 13], [11, 9], [12, 17], [13, 10], [14, 16], [15, 2], [16, 14], [17, 12]]
(With $$n = 18$$ and counting $$[i,j]$$ and $$[j,i]$$ as the same, we have $$n/2$$ matchings.)

But for large $$n$$, I start to see some errors. Things like $$s_1$$ matches with $$s_{356}$$ but $$s_{356}$$ matches with $$s_8$$ etc.

The vectors $$s_i$$ are randomly generated floats, so ties are extremely unlikely. Any thoughts on what could be going on?

To better explain the issue, I have posted some Python code on github:

https://github.com/zburq/scipy_linear_sum_assignment_issue/blob/main/polygon_issue.ipynb

Your algorithm does this because it has no reason to avoid it; your code is not solving the problem you think it is solving.

When you use the $$n\times n$$ matrix $$M$$ to find a bipartite matching, you are solving a problem on a bipartite graph with $$2n$$ vertices; $$n$$ vertices for the rows and $$n$$ different vertices for the columns. The algorithm is trying to match each row to a column.

For small $$n$$, you may get lucky: you may be able to give each vertex its best option. If this happens, the result will look like it's giving you a matching of the original $$n$$-vertex graph. That's because if $$M_{ij}$$ is the smallest entry in row $$i$$, then $$M_{ji}$$ is the smallest entry in column $$i$$; so if both row $$i$$ and column $$i$$ get their best option, they are matched to column $$j$$ and row $$j$$, respectively.

But in general, there is simply no such constraint. It may be that the best matching pairs row $$1$$ with column $$356$$, but row $$356$$ is paired with column $$8$$.

Here is a simple example of a $$6 \times 6$$ matrix $$M$$ for which we can see this will happen. As you desire, I have set the diagonal of $$M$$ to a very high number.

$$\begin{bmatrix} \infty & 0.1 & 0.2 & 10 & 20 & 30 \\ 0.1 & \infty & 0.3 & 40 & 50 & 60 \\ 0.2 & 0.3 & \infty & 70 & 80 & 90 \\ 10 & 40 & 70 & \infty & 0.4 & 0.5 \\ 20 & 50 & 80 & 0.4 & \infty & 0.6 \\ 30 & 60 & 90 & 0.5 & 0.6 & \infty \end{bmatrix}$$ Here, one possible best bipartite matching is the matching $$\{(1,2), (2,3), (3,1), (4,5), (5,6), (6,4)\}$$ with total weight $$0.1+0.3+0.2+0.4+0.6+0.5 = 2.1$$. This is not a legitimate matching of the $$6$$-vertex graph.

In the $$6$$-vertex graph, we need to choose some edge that connects vertices $$\{1,2,3\}$$ to vertices $$\{4,5,6\}$$, all of which are much more expensive. The best matching uses edges $$\{1,4\}$$, $$\{2,3\}$$, and $$\{5,6\}$$ and has weight $$10 + 0.3 + 0.6 = 10.9$$.

You can reproduce this example with distances in the plane, if you take a set of $$6$$ points which form two clusters of $$3$$ points.

• The best non-bipartite matching between the set of $$6$$ points uses one edge between the two clusters, and an edge within each cluster.
• The bipartite matching algorithm instead wants to draw $$6$$ arrows of minimum total length, so that one goes into each point and one goes out. The cheapest way to do that is to draw two triangles.