[Math] fast way to compute matrix multiplication mod $p$

algorithmslinear algebra

I think people have some general strategy to do matrix multiplication fast. But what about for the finite field of $p$ elements? (e.g. when $p=2$, one should have some faster way.)

So my question is, given two integer entry matrix, $A$ and $B$. Is there a fast way to compute $AB \mod p$?. And is there a fast way of computing $A^n \mod p$ for $n$ not too big. (Note that for $n=|\mathrm{GL}_t(\mathbb{Z}/p)|$, we have $A^n=I \mod p$, where $t$ is the size of $A$.)

Best Answer

For any field, we can define the exponent of matrix multiplication over that field to be the smallest number $\omega$ such that $n \times n$ matrix multiplication can be done in $n^{\omega + o(1)}$ field operations as $n \to \infty$. Schönhage proved that it is invariant under taking field extensions, so it depends only on the characteristic of the field. Probably it equals $2$ in every characteristic, but that isn't known. (Certainly $\omega \ge 2$, since it takes at least one operation to get each entry.)

Let $\omega_p$ be the exponent in characteristic $p$. Then one can show that $\limsup_{p \to \infty} \omega_p \le \omega_0$, basically because every low-rank tensor decomposition in characteristic $0$ will work for all but finitely many primes. Over the rationals, you just need to avoid primes that occur in denominators.

However, for small primes the exponent could (as far as we know) be better or worse than in characteristic $0$, and it's even possible that it could be substantially better for all primes, although in that case you can show that as $p \to \infty$ the asymptotics would have to take longer and longer to kick in (i.e., the size of the matrices needed to see the improvement would grow as a function of $p$).

Strassen's exponent $2.8$ algorithm works in every characteristic, and it's the only practical method that achieves exponent less than $3$. However, if you want to do this in practice, it's important to think about issues like cache and memory access. (These issues are often more important than counting arithmetic operations.) Unless you really know what you are doing, it's not worth trying to write your own code, except as a learning exercise. For example, in characteristic $2$ the M4RI code mentioned by unknown (google) seems like it could be a good bet.