Matrix Multiplication – How Fast Can It Really Be Done?

algorithmscomputational complexitymatricesnumerical linear algebra

Background: The Strassen Algorithm, described here, has a computational complexity of $\text{O}(n^{2.807})$ for the multiplication of two $n \times n$ matrices (the exponent is $\frac{\log7}{\log2}$). However, the constant is so large that this algorithm is in fact slower in practice than naive matrix multiplication for small $n$. Similarly, the Coppersmith-Winograd algorithm, which has the lowest asymptotic complexity of all known matrix multiplication algorithms, has an exponent of $2.376$ and was discussed here previously.

Question: Recently, I made a claim in a submitted paper that the Smith normal form algorithm has super-cubical complexity and a reviewer countered by saying that actually, the complexity has been reduced to matrix multiplication time = $n^{2.37\ldots}$. I am not an expert on matrix algorithms and would happily change the offending line, but the experience has forced me to wonder, what are the practical implications of saying "X can be done in matrix multiplication time"? More precisely,

Does there exist an actual software implementation of Coppersmith Winograd? If not, is there a theoretical obstacle to its existence?

By a theoretical obstacle I don't mean something like "Well, it would only be better than existing techniques for $n$ larger than the number of atoms in the universe so what's the point?", but rather something like "the algorithm uses the axiom of choice, or the classification of finite simple groups" etc.

PS: Okay, so there is also this paper which apparently reduces the complexity of the Coppersmith-Winograd approach to $2.3737$ from $2.376$, so I stand corrected about CW being the fastest. The question still stands if we replace CW by the method of V. V. Williams.

Best Answer

There are currently no practical implications of any fast matrix multiplication algorithms besides Strassen's. The Coppersmith/Winograd algorithm and its descendants (Stothers, Williams) are very complex, depend on probabilistic constructions, etc. There's no theoretical obstacle to implementing them in the sense you're asking about, and it's something that's humanly possible, but there's little point to it and I don't believe anyone has ever actually done it. It would be complicated and painful, and the only purpose would be really learning how the algorithm works, since the cross-over point for where it would improve on the naive cubic-time algorithm is enormous (so you'll never actually see any improvement). There are other algorithms that would be somewhat easier to implement, at the cost of worse asymptotic performance, but they are also utterly impractical.

There's also a deeper issue if you try to use algebraic algorithms in practice. The algebraic complexity model typically used for these problems counts only arithmetic operations and considers memory access to be free. This made sense way back when, since a single floating point operation was comparatively expensive, but nowadays memory management can be the real bottleneck in practice. Algebraic complexity is beautiful and theoretically important, but it ignores important practical issues.

If you want to do fast matrix multiplication in practice, it will presumably be on a parallel computer. That introduces further issues of communication complexity; see http://arxiv.org/abs/1202.3173 for an analysis of the Strassen case (both theoretically and in practice).