Proof Assistants – How to Verify a Verifier of Formalized Proofs

lo.logicmathematical-philosophyproof-assistants

In an unrelated thread Sam Nead intrigued me by mentioning a formalized proof of the Jordan curve theorem. I then found that there are at least two, made on two different systems. This is quite an achievement, but is it of any use for a mathematician like me? (No this is not what I am asking, the actual question is at the end.)

I'd like to trust the theorems I use. To this end, I can read a proof myself (the preferred way, but sometimes hard to do) or believe experts (a slippery road). If I knew little about topology but occasionally needed the Jordan theorem, a machine-verified proof could give me a better option (and even if I am willing to trust experts, I could ensure that there are no hidden assumptions obvious to experts but unknown to me).

But how to make sure that a machine verified the proof correctly? The verifying program is too complex to be trusted. A solution is of course that this smart program generates a long, unreadable proof that can be verified by a dumb program (so dumb that an amateur programmer could write or check it). I mean a program that performs only primitive syntax operations like "plug assertions 15 and 28 into scheme 9". This "dumb" part should be independent of the "smart" part.

Given such a system, I could check axioms, definitions and the statement of the theorem, feed the dumb program (whose operation I can comprehend) with these formulations and the long proof, and see if it succeeds. That would convince me that the proof is indeed verified.

However I found no traces of this "dumb" part of the system. And I understand that designing one may be hard. Because the language used by the system should be both human-friendly (so that a human can verify that the definitions are correct) and computer-friendly (so that a dumb program can parse it). And the definitions should be chosen carefully – I don't want to dig through a particular construction of the reals from rationals to make sure that this is indeed the reals that I know.

Sorry for this philosophy, here is the question at last. Is there such a "dumb" system around? If yes, do formalization projects use it? If not, do they recognize the need and put the effort into developing it? Or do they have other means to make their systems trustable?

UPDATE: Thank you all for interesting answers. Let me clarify that the main focus is interoperability with a human mathematician (who is not necessarily an expert in logic). It seems that this is close to interoperability between systems – if formal languages accepted by core checkers are indeed simple, then it should be easy to automatically translate between them.

For example, suppose that one wants to stay within symbolic logic based on simple substitutions and axioms from some logic book. It seems easy to write down these logical axioms plus ZF axioms, basic properties (axioms) of the reals and the plane, some definitions from topology, and finally the statement of the Jordan curve theorem. If the syntax is reasonable, it should be easy to write a program verifying that another stream of bytes represents a deduction of the stated theorem from the listed axioms. Can systems like Mizar, Coq, etc, generate input for such a program? Can they produce proofs verifiable by cores of other systems?

Best Answer

Is there such a "dumb" system around? If yes, do formalization projects use it? If not, do they recognize the need and put the effort into developing it? Or do they have other means to make their systems trustable?

This is called the "de Bruijn criterion" for a proof assistant -- just as you say, we want a simple proof checker, which should be independent of the other machinery. The theorem provers which most directly embody this methodology are those in the LCF tradition, such as Isabelle and HOL/Light. They actually work by generating proof objects via whatever program you care to write, and sending that to a small core to check. Systems based on dependent type theory (such as Coq) tend to have more complex logical cores (due to the much greater flexibility of the underlying logic), but even here a core typechecker can fit in a couple of thousand lines of code, which can easily be (and have been) understood and reimplemented.