Criteria for infeasibility of bipartite graph full matching/linear sum assignment

bipartite-graphsgraph theorymatching-theory

I am trying to (programmatically) find a minimum weight matching in a large bipartite graph (over a thousand nodes on both sides with over a million edges), and I run into an issue where the "cost matrix is infeasible". Unfortunately the software does not give a reason why this is infeasible, so I am wondering if there are things that I should check while building the graph to ensure that there is a solution.

I am using networkx.bipartite.minimum_weight_full_matching which defers the solving of the cost matrix to scipy.optimize.linear_sum_assignment. Both links are to the documentation of the functions and have references to the papers describing the algorithms used, but I am a little out of my depth trying to understand why a cost matrix would be infeasible. The graph is not connected (there are 2-3 connected components), but I am ensuring that every node in the graph has at least one edge.

Interestingly, the algorithm used by networkx.min_weight_matching does find a solution, but it is much slower than the bipartite version.

Best Answer

The difference is that:

  • minimum_weight_full_matching specifically wants a full matching with minimum weight: in a bipartite graph with bipartition $(U,V)$, it wants a matching of size $\min\{|U|,|V|\}$. If there is no such matching at all, it gives an error.
  • min_weight_matching (which is slower because it uses the blossom algorithm for matchings in general graphs) finds a maximum matching with minimum weight: it does not care how big a maximum matching is.

The general criterion for when a full matching exists in a bipartite graph is given by Hall's theorem. This is not any easier to check, when building the graph, than it is to find the matching to begin with.

In a dense graph which is not connected, which is what you've got, there is a "low-effort" check that might solve the problem:

  1. Find the connected components of the graph; these will be bipartite graphs themselves, so they will have bipartitions $(U_1, V_1)$, $(U_2, V_2)$, and so forth.
  2. Construct a set $W$ by taking the union of the smaller of $U_i$ and $V_i$ for each $i$, and use $W$ as one side of the bipartition of $G$. (This can be provided to minimum_weight_full_matching via the optional top_nodes parameter.)

Here's why this makes sense to do. When the bipartite graph is not connected, there are multiple bipartitions. The bipartion where one side is $W$ is the most unbalanced bipartition: it is the one for which a full matching is the smallest. There is still no guarantee that a full matching exists; however, this is the only bipartition for which the full matching could possibly exist.

If this still does not help, we can try to solve the problem by adding some artificial vertices. Specifically, add some number of vertices which are:

  1. Adjacent to all vertices in $W$, and no others.
  2. Have a very high weight on all their edges, so that the minimum-weight matching will not want to use them unless it has to. (To be safe, make it something like $|W|$ times bigger than the largest weight already in the graph.)

You should then be able to find a minimum-weight full matching; throw away the artificial vertices, and you'll have a minimum-weight maximum matching in the graph.

How many artificial vertices to add? Adding $|W|$ of them is definitely enough, but probably overkill; it's possible that only one or two will be enough. To know for certain, run maximum_matching on the graph first; if it outputs a matching of size $m < |W|$, add $|W|-m$ artificial vertices.

Related Question