Permutation Matrix Measure

linear algebramatricesoptimizationpermutation-matrices

I’m trying to make a neutal network with dynamic architecture, meaning that the weight matrices will be added/removed dynamically when they are simple permutation matrices and thus don’t actually do any computation. However, I’ve run into an issue, I need a differentiable way to determine ”how close” the weight matrices are to being a permutation matrix. This measure will then be added to the loss function as a regularization term.

I know how to find the closest permutation matrix by making a cost matrix as one of the answers from here:
Nearest signed permutation matrix to a given matrix $A$

However, if I use the norm of the cost matrix as a measure of ”permutivity” the simple 0 matrix will always give a lower measure (loss) than an actuall permutation matrix. Whenever I try to find or come up with a measure I always run into this same problem, there’s always some other matrix which gives a lower score than a permutation matrix.

Do any of you guys know of a ”permutation matrix measure” pr do you have any brilliant ideas on how to make one? Keep in mind that it has to be differentiable to be usefull,so no max functions or anything like that.

Update: I have found one measure that does what I need, but it’s too computationally heavy to use. I’ll write it here anyway, since it gives insight to what I’m looking for and might spawn new ideas:

$ \text{”permutivity”} = \prod_{i = 1}^{n} \sum_{j = 1}^{m}\sum_{k = 1}^{m} (a_{j,k}-{p_{i}}_{j,k})^2 $

Where $p_i$ is the ”i:th” permutation matrix of size MxM. So basically the product of the L2 distance of all the different permutation matrices. As you can imagine, this is not practical for several reasons, first of all, the number of permutation matrices are m! and for neural networks m is usually fairly large. Secondly, the product would explode causing numerical overflows when trying to calculate it. The second issue can be remedied using some sort of ”cap” function around the sums, but I can’t see any simple way to fix the first problem 🙁

Best Answer

How about

$$ \lambda\sum_{i,j}a_{ij}^2\left(1-a_{ij}\right)^2+\mu\left(\sum_i\left(\sum_ja_{ij}-1\right)^2+\sum_j\left(\sum_ia_{ij}-1\right)^2\right) $$

This is $0$ only if all entries are $0$ or $1$ and all row and column sums are $1$, i.e. if $A$ is a permutation matrix.

Related Question