[Math] A global mathematics library

soft-question

Just for personal interest, I am not (yet) professionally involved in it. My question is about the state of arts in digitalization of mathematics and to what extent it is possible and reasonable.

There are different levels of digitalizations:

  1. OCR scan all historical mathematical texts
  2. organize metadata of references and authors (e.g. as graphs)
  3. extract mathematical objects (like theorems, definitions, etc)
  4. extract proofs and ideas
  5. formalize mathematics such that it can be completely checked by theorem provers

The main effort I found is
https://imkt.org/

Steps 3/4 and 5 may be of independent interest and should be understood more parallel than chronological. Point 5 is more interesting in having (error-free) formalized math. It should also be allowed to choose different foundations of mathematics and the possibility to switch in between them. Point 3/4 is more interesting for a researcher that wants all references for a definition, a theorem, a keyword. It would be a wonderful source for doing data analysis of mathematical knowledge (historical, social, semantical, etc). In contrast to 5 it can contain errors and speculations. The main interest is in identifying and referencing mathematical objects over all the produced texts in history of math.

My question is:

The goal of https://imkt.org/ is huge but when looking at its first projects it looks (sry) a little bit disappointing. The main focus (also of other literature that I scanned through) lies on connecting the existing databases and languages, i.e. of theorem provers, computer algebra systems (and maybe wikis). I understand that different applications in math demand different systems (e.g. integer series http://oeis.org/ should also be part of it?) Can or shouldn't there be one system that contains everything that can be accessed (and is stored not just referenced!) through the same system? Are my dreams of such a system over the top?

  • One of the largest issues is the copyright of the big publishers.
    More and more is going in direction open math. Until then it is
    unclear to what extent a library can be complete (gaps are somehow
    missing the point of this system).

  • The other issue is the efficiency in producing the content extraction
    and to advance it by advertising the advantages of such a library to
    mathematicians such that it will move itself at some point.

There have been many efforts in the past that were abandoned again or here for years (like Mizar) but far from being known and used in daily mathematics.

Best Answer

None of your five levels of digitalization are at all within reach. Let's take just the first one. As you say, copyright is a huge obstacle. Consider what happened when the same idea was attempted by Google (resulting in Google Books), as explained wonderfully in this article in the Atlantic. Larry Page (as in, PageRank) set up a tremendous effort to get books from libraries and scan them en masse with OCR technology to make them searchable.

After doing this for some time, publishers filed suit against Google. Google hired a massive legal team because it cared so much about creating such a library. Google managed to bring the publishers onboard with the idea, pointing out that it would allow publishers to sell out-of-print works. Unfortunately, this raised another issue, like a sad game of whack-a-mole, and the Department of Justice got involved to prevent Google from having a de-facto monopoly on out-of-print books. In the end, the Judge prevented the settlement that Google and the publishing companies had agreed to.

The same things would happen in the math world. The point is: you have a bunch of authors who died a long time ago and can't give consent to update their old publishing contracts. You have publishers who won't let go of content unless they can profit on it. And you don't have any organization large enough to be the central clearing house to connect customers who want to read these works with the places they can view and buy them. The problem only gets harder every year, because we produce more and more mathematics all the time (and OCR for math is not great, throwing a wrench into your third idea). My opinion: if Google couldn't pull together a global library, despite all the effort they put into this problem, there's no hope for the rest of us.