[Math] Theorems that are essentially impossible to guess by empirical observation

big-listexamplesexperimental-mathematics

There are many mathematical statements that, despite being supported by a massive amount of data, are currently unproven. A well-known example is the Goldbach conjecture, which has been shown to hold for all even integers up to $10^{18}$, but which is still, indeed, a conjecture.

This question asks about examples of mathematical statements of the opposite kind, that is, statements that have been proved true (thus, theorems) but that have almost no data supporting them or, in other words, that are essentially impossible to guess by empirical observation.

A first example is the Erdős–Kac theorem, which, informally, says that an appropriate normalization of the number of distinct prime factors of a positive integer converges to the standard normal distribution. However, convergence is so slow that testing it numerically is hopeless, especially because it would require to factorize many extremely large numbers.

Examples should be theorems for which a concept of "empirical observation" makes sense. Therefore, for instance, theorems dealing with uncomputable structures are (trivially) excluded.

Best Answer

One of the most interesting examples that happened recently is the Katz-Sarnak conjecture asserting that the average rank of elliptic curves (ordered by some reasonable height) defined over $\mathbb{Q}$ is equal to $1/2$. This is a generalization of a more specific conjecture due to Goldfeld which asserts that the average rank of quadratic twists of a given elliptic curve is equal to $1/2$. Goldfeld's conjecture dates back to the 1970's and the Katz-Sarnak conjecture was made in the 1990's.

This is a story that was told by Manjul Bhargava during one of his lectures at Oxford where I was in the audience. He said that, as a graduate student at Princeton in the 90's, he had heard Sarnak lecture on his conjecture with Katz. Curious, Bhargava then looked up existing data on the average rank of elliptic curves and found that of the tens of thousands of elliptic curves that were tabulated, the average rank was quite large (exceeding two), and appeared to be increasing. The young Bhargava printed out the results and showed Sarnak the next day, stating that the data clearly does not support his conjecture. According to Bhargava, without batting an eye Sarnak said "the data is misleading; eventually the average rank will plateau and start going down towards $1/2$ when more curves are considered". Bhargava was apparently unconvinced.

The story of course eventually leads to the groundbreaking work of Bhargava and Arul Shankar in the last decade on average rank of elliptic curves. In three spectacular papers(here, here, and here) they proved that the average rank of elliptic curves, sorted by the "naive height" defined for an elliptic curve given in short Weierstrass form $E_{A,B} : y^2 = x^3 + Ax + B, A,B \in \mathbb{Z}$ as $H(E_{A,B}) = \max\{4|A|^3, 27B^2\}$, is at most $1.5, 1.17, 0.885$ respectively. The lecture mentioned above was given in 2016. At the time, the existing data had not yet shown that the average rank dips below 1, even though the best theoretical result gives an upper bound of $0.885$. Bhargava ended his talk by displaying a brand new chart showing that the average rank seems to dip below $0.9$, still above the theoretical bound, compiled by the work of his students and collaborators.