[Math] How to be rigorous about combinatorial algorithms

algebraic-combinatoricsalgorithmsco.combinatoricscomputer science

1. The question

This may be the worst question I've ever posed on MathOverflow: broad,
open-ended and likely to produce heat. Yet, I think any progress that will be
made here will be extremely useful to the whole subject. In a nutshell, the
question is:

Short version. What tools (notations, formalisms, theories) are we missing
to bring algorithmic combinatorics to the standard of rigor of modern algebra?

To make this more precise, let me explain what I mean by algorithmic
combinatorics (and give a few examples).

Lots of algorithms appear in mathematics, but we mathematicians tend to avoid
them when we are proving things. For example, the Euclidean algorithm computes the gcd of integers; but when it comes to proving things about the gcd (say, Bezout's identity), it is much easier to use induction. In linear algebra, we prefer not to use the Gaussian
elimination algorithm either, but instead to extract its main ideas into
simple statements and rely on them exclusively, essentially splitting the
algorithm back into the little steps it's made of and recombining these steps
back into induction arguments.

This reluctance to work with algorithms has a reason: Very few mathematicians
know how to formally define an algorithm and its execution;
even fewer know how to rigorously reason about those. Meanwhile, everyone
knows induction. So, as long as algorithmic reasoning can be replaced by
induction easily enough, there's little reason to struggle through the
algorithmic swamps.

This question is about the cases when algorithms cannot be easily avoided.
This tends to happen a lot in combinatorics (particularly bijective
combinatorics, but also algebraic combinatorics). The algorithms occurring
there often have the following properties:

  • The algorithm involves a lot of
    clobbering. That is, the algorithm
    doesn't only set each variable once, but rather modifies it several times.
    This makes it difficult to rewrite the algorithm as a recursive definition
    (because we have to keep track of "time", i.e., count steps), and also
    complicates reasoning about the algorithm's execution (since we have to
    specify what time we are talking about).

  • The well-definedness of the algorithm is not immediately obvious; that is,
    it may involve non-deterministic steps, or it may "crash" (more formally:
    throw exceptions). Often the first thing one wants to prove about the
    algorithm is that it neither is actually non-deterministic (i.e., the choices
    don't matter to the final result) nor can it "crash". But the mental overhead
    involved in arguing about the algorithm before this is proven is
    substantial, and it further complicates any formal models.

I will call algorithms with these properties complex. To make things worse,
in combinatorics, algorithms are often used as definitions of objects, not
just as tools to their understanding, so we cannot just throw away everything
we don't care about for a given proof.

Let me give a few examples of complex algorithms:

  • The Robinson-Schensted-Knuth (RSK) algorithm.
    (Various expositions:
    Knuth,
    Casselman,
    van Leeuwen.) At least a dozen variants on this algorithm appear
    in the literature now, usually more complicated than the original.

  • The standard parenthesis-matching procedure: Start with a word made of
    parentheses, such as $(()()))((()$. Keep matching parentheses by the standard
    rule: If you ever find a $($ parenthesis before a $)$ parenthesis such that
    all parentheses between these two are already matched (this includes the case
    when there are no parentheses between these two), you can match these two
    parentheses. Keep doing this until no more matchable parentheses are left.
    This is a simple example of an algorithm that "looks" non-deterministic,
    because you can often choose which two parentheses to match, but in truth it
    can be shown (using the diamond
    lemma
    ) that the final result
    does not depend on the choices. This procedure is typically used as a
    subroutine in more complicated algorithms; see, e.g., Shimozono's Crystals
    for dummies
    .

  • The evolution of a box-ball system; see, e.g., §3.1 in
    Fukuda. Note the wording using boxes,
    balls and colors — a staple in this subject, and a pain for formalization.

  • The definition of $q\left( u \right)$ in §2.1 of Erik Aas, Darij
    Grinberg, Travis Scrimshaw, Multiline queues with spectral
    parameters
    .
    This one has been a particularly pointy thorn in my side — the proof of Lemma
    2.2 is completely ineffable; the only hint we can give is that the reader
    should draw a picture and stare at it a bit.

This question is not about these four examples in particular; I am just using
them to illustrate the kind of algorithms I'm talking about and the issues
with them. These four examples are relatively simple when compared to more
recent constructions, such as type-D analogues of RSK, complicated bijections
between Littlewood-Richardson tableaux, and the mess that is the theory of
Kohnert diagrams.

So here is the precise version of my question:

Precise version. What are viable methods of reasoning about complex
algorithms that allow for a level of rigor similar to that in the rest of
modern mathematics (e.g., algebra, algebraic geometry, analysis, even
non-algorithmic combinatorics)?

In particular, I'm looking for practical ways to reason about

  • the algorithm itself (syntax);

  • the results of the algorithm (semantics);

  • the way the results change if inputs are modified (uhm, does that have a name?…).

The syntax question is the easy one: You can always implement an algorithm in
an actual imperative programming language. It is usually easy to translate
pseudocode into actual code, so there isn't much of a rigor deficit at this
step.
But the combinatorics literature badly lacks rigor when dealing with the
other two questions; almost every author reasons about their algorithms through
wishy-washy "behaviorial" arguments, waving their hands and tracing examples
in the hope that they are in some sense representative of the general case.
Programming doesn't help here: Rarely does an imperative programming language
come with tools for proving the properties of programs.
(Some functional languages do, but translating a combinatorial algorithm
into functional code in an intuition-preserving way is far from easy.)
The result is that a significant part of the contemporary combinatorics
literature is unreadable to all but the most mentally agile;
and that the community is slow to trust new results with algorithmic proofs,
often waiting for independent confirmations through algebraic methods.

Ideally, I prefer a way to formalize (at least to the level of rigor accepted in modern algebra) the handwaving that combinatorialists typically do (at least when it isn't genuinely wrong), rather than a completely new toolkit that would require me to reason completely differently. But honestly, the latter wouldn't be bad either, and may even be better in the long run.

When I say "viable" and "practical", I mean that formalizing the proofs should not be forbiddingly tedious, hard or requiring creativity. Think of (most) abstract algebra, where finding the proof is the hard part and formalizing it is easy; I want this for combinatorics.

2. Approaches

Next I am going to list a few approaches I've seen to the problem, and
explain what I believe to be their shortcomings.
Unfortunately, my spotty knowledge of computer science makes me the wrong person to comment on these approaches, so I'm making stone soup here — but this is a question, so I am not feeling guilty.

2.1. Hoare logic

The classical way of proving properties of algorithms is called Hoare
logic
. The rough idea is that you insert an assertion between any two consecutive instructions of the algorithm, and prove that for each instruction, if the state before the (execution of the) instruction satisfies the assertion before the instruction, then the state after the (execution of the) instruction satisfies the assertion after the instruction. (Fine print: Keywords like "if", "while" and "for" come with their special rules about what you need to prove, and in the presence of "while"s, you also need to prove that the algorithm doesn't get stuck in an endless loop.)

For example, let us pick apart the Euclidean algorithm (written here in a somewhat defensive style of Python, with the tacit understanding that all variables are integers):

def euclid(n, m): # Compute the gcd of two nonnegative integers n and m.
    u = n
    v = m
    while v > 0:
        if u < v:
            u, v = v, u # This swaps u and v in Python.
        u = u - v
    return u

To prove that the returned value is really $\gcd\left(n,m\right)$, we can insert Hoare assertions (traditionally written in braces) as follows (we use == for equality):

def euclid(n, m): # Compute the gcd of two nonnegative integers n and m.
    u = n
    {u == n.}
    v = m
    {u == n and v == m.}
    while v > 0:
        {gcd(u, v) == gcd(n, m) and u >= 0 and v >= 0.}
        if u < v:
            {u < v and gcd(u, v) == gcd(n, m) and u >= 0 and v >= 0.}
            u, v = v, u # This swaps u and v in Python.
            {u >= v and gcd(u, v) == gcd(n, m) and u >= 0 and v >= 0.}
        {u >= v and gcd(u, v) == gcd(n, m) and u >= 0 and v >= 0.}
        u = u - v
        {gcd(u, v) == gcd(n, m) and u >= 0 and v >= 0.}
    {u == gcd(n, m).}
    return u

(Note that the last Hoare assertion, u == gcd(n, m), follows from the previous Hoare assertion gcd(u, v) == gcd(n, m) and u >= 0 and v >= 0 combined with the fact that v <= 0, which is automatically inferred from the fact that we just escaped from the while-loop. This is part of the Hoare logic for the "while" keyword.)

This is all fun when algorithms are as simple as euclid. What if we do this to the RSK algorithm? It turns out that this has already been done — in Donald Knuth, Permutations, matrices, and generalized Young tableaux, Pacific J. Math., Volume 34, Number 3 (1970), pp. 709–727.
More precisely, Knuth considered a subroutine of the RSK algorithm: the insertion of a single number $x$ into what he called a generalized Young tableau $Y$. He calls this subroutine $\operatorname{INSERT}(x)$ (with $Y$ being implicit).
His Hoare assertions are rather nice (although he needs to add "$i \geq 1$ and $j \geq 1$" to all of them in order to ensure that $Y$ remains a generalized Young tableau throughout the algorithm).
Unfortunately, in order to prove the subsequent claim that the functions $\operatorname{INSERT}$ and $\operatorname{DELETE}$ are inverses of each other (in an appropriate sense), I had to extend the Hoare assertions for each of the two functions to… 2 pages each. Basically, Knuth's Hoare assertions are precisely what is needed to show that $Y$ remains a generalized Young tableau at each step of the algorithm; but they don't let you trace the changes to its entries or argue such things as "the insertion path trends left" (the inequalities (2.5) in Knuth's paper, who just states them as something obvious) and "only the entries $y_{i,r_i}$ are ever modified". For each of these simple observations, you had to add an extra Hoare assertion at the end and percolate it through all the instructions. (See the proof of Proposition A.4 on pp. 4–6 of http://www.cip.ifi.lmu.de/~grinberg/algebra/knuth-comments1.pdf , but beware that I wrote this when I was just starting to learn combinatorics, so I was probably being wasteful.)

Perhaps there are tricks to using Hoare logic "economically" and I just don't know them. There are certainly useful modifications that can be made, such as introducing new auxiliary variables inside Hoare assertions, and even new instructions that don't modify the variables of the original algorithm. But it's far from clear how to use this in combinatorics. I would love to see a collection of examples of Hoare logic in mathematics.

2.2. Transforming algorithms into recursive definitions

We can try painstakingly transforming a combinatorial algorithm into a recursive definition. For example, we can turn each variable $v$ into a sequence of objects $v_1, v_2, v_3, \ldots$, with $v_i$ being the value of $v$ after step $i$ of the algorithm. And we can introduce another sequence of integers $l_1, l_2, l_3, \ldots$, with $l_i$ being the number of the code-line at which we are after step $i$ of the algorithm. Then, the algorithm transforms into a jointly recursive definition of all these objects. This is a cheap way of formalizing the execution of an algorithm. The advantage of this approach over Hoare logic is that you have all the history of your execution at your disposal rather than just the current state. Unfortunately, actually reasoning about this recursive definition is forbiddingly cumbersome, and one is often forced to prove all the Hoare assertions with the extra complexity of the time parameters. I have never seen an algorithm being rigorously studied using this method in practice.

2.3. Monads?

Modern functional programming languages come with a collection of monads that are useful for translating imperative code into functional code. In particular, the state monad lets one write "imperative blocks" inside functional code, which behave as if they had actual mutable state. The maybe monad emulates algorithms that may "crash", and the list monad emulates non-determinism. Unfortunately, this is again just an answer to the question of formalizing the algorithm, but not to the question of formalizing reasoning about it. I would really love to be proven wrong on this point: Are there some known sources on the use of monads in proofs? (I know that the distribution monad was used in Coq-Combi to formalize the "hook walk" in the proof of the hook-length formula, but I have never learnt enough ssreflect/Coq to read that proof.)

2.4. Algebraic circumvention

The final trick to cut down on the complexity of algorithms is to find an alternative non-algorithmic (usually algebraic) definition of what they compute, and then forget about the algorithm and just work with that new definition. For example:

  • There are several ways to redefine the RSK algorithm in a more proof-friendly manner. One nice way is shown in Sam Hopkins, RSK via local transformations: Here, the RSK algorithm is re-encoded as a transformation between integer arrays and reverse plane partitions, and this transformation is redefined through a much more convenient recursive definition. Unfortunately, while the new definition is a lot easier to analyze in some aspects, it is not obvious at all that it is equivalent to the old one (and to my knowledge, the proof of this equivalence is scattered across 2 or 3 papers). It is also not suited to proving the connections of RSK with the plactic monoid.

  • The parenthesis-matching procedure can be redefined recursively, too, at the cost of losing the intuition and having to re-prove properties that were previously obvious. I don't know how easy it would be to build the theory of crystals on the basis of such a recursive definition. I think Florent Hivert's Coq-Combi project does something like this for the Littlewood-Richardson rule, but this project was far from straightforward and it is unclear how well the methods scale to deeper results.

  • For box-ball systems, there is the "carrier algorithm" which describes the dynamics of the system in some more algebraic way, although it has an annoying 2-dimensionality that prevents it from being too easy to work with.

  • Combinatorial properties of tableaux can occasionally be deduced from representation theory of quantum groups or Lie algebras.

  • Symmetric groups can be interpreted as Coxeter groups of type A, which provides a whole new viewpoint from which some of their properties become much clearer (but others become a lot more complicated).

In general, however, there is no way to take a combinatorial algorithm and "find the algebraic structure behind it"; the cases where this has been done are few. There is no "clarifying" algebraic structure behind most combinatorial objects. Moreover, this is a somewhat escapist approach: We just give up combinatorics and do algebra instead.

Best Answer

Broadly speaking, there are three approaches to reasoning about software semantics: Denotational semantics provides a mapping from a computer program to a mathematical object representing its meaning. Operational semantics makes use of logical statements about the execution of code, typically using inference rules similar in style to natural deduction for propositional logic. Axiomatic semantics, which includes Hoare logic, is based on assertions about relationships that remain the same each time a program executes. Here's a good book on different semantic formalisms.

One approach I'd recommend, perhaps somewhat more practical than others, is something like Dijkstra's predicate transformer semantics, a reformulation of Hoare logic, which is expounded in David Gries' classic book The Science of Programming. I'd have thought anyone who is willing to expend sufficient effort to master this should be able to use it to reason effectively about algorithms (combinatorial or otherwise). The definition of $q(u)$ mentioned in the question looks like a relatively simple candidate for this approach.

Perhaps the major challenge for most mathematicians is the fact that becoming competent in a suitable formalism takes some considerable time, and also probably requires as a precursor some basic computer science knowledge in order to learn how to think like a computer scientist (and be able to construct well-designed modular code).

Related Question