[Math] Revisiting the unreasonable effectiveness of mathematics

computer scienceit.information-theorymp.mathematical-physicsreference-request

Question:

On balance, with theoretical advances in algorithmic information theory and Quantum Computation it appears that the remarkable effectiveness of mathematics in the natural sciences is quite reasonable. By effectiveness, I am generally referring to Wigner's observation that mathematical laws have remarkable generalisation power.

Might there be a modern review paper on the subject for mathematicians where the original question is re-evaluated in light of modern mathematical sciences?

An information-theoretic perspective:

In order to motivate an information-theoretic analysis, it is worth observing that Occam's razor is an essential tool in the development of mathematical theories.

From an information-theoretic perspective, a Universe where Occam's razor is generally applicable is one where information is conserved. The conservation of information would imply that fundamental physical laws are generally time-reversible. Moreover, given that Occam's razor has an appropriate formulation within the context of algorithmic information theory as the Minimum Description Length principle this information-theoretic analysis generally presumes that the Universe itself may be simulated by a Universal Turing Machine.

David Deutsch and others have done significant work demonstrating the plausibility of the Physical Church-Turing thesis(which is consistent with the original Church-Turing thesis) and this would explain why mathematical methods are so effective in the natural sciences.

This brief analysis has emerged from informal discussions with a handful of algorithmic information theorists(Hector Zenil, Marcus Hutter, and others) and it makes me wonder whether complementary theories from mathematical physics might help mathematicians account for the remarkable effectiveness of mathematics in the natural sciences.

Clarification of particular terms:

Minimum Description Length principle:

Given data in the form of a binary string $x \in \{0,1\}^*$ the Minimum Description Length of $x$ is given by the Kolmogorov Complexity of $x$:

\begin{equation}
K_U(x) = \min_{p} \{|p|: U(p) = x\}
\end{equation}

where $U$ is a reference Universal Turing Machine and $p$ is the shortest program that takes as input the empty string $\epsilon$ and outputs $x$.

The Law of Conservation of Information:

The Law of Conservation of information which dates back to Von Neumann essentially states that the Von Neumann entropy is invariant to Unitary transformations. This is meaningful within the framework of Everettian quantum mechanics as a density matrix may be assigned to the state of the Universe. This way information is conserved as we run a simulation of the Universe forwards or backwards in time.

The Physical Church-Turing thesis:

The Law of Conservation of information is consistent with the observation that all fundamental physical laws are time-reversible and computable. The research of David Deutsch(and others) on the Physical Church-Turing thesis explains how a Universal Quantum computer may simulate these laws. Michael Nielsen wrote a good introductory blog post on the subject [7].

The Physical Church-Turing thesis is a key point in this discussion as it provides us with a credible explanation for the remarkable effectiveness of mathematics in the natural sciences.

A remark on effectiveness:

What I have retained from my discussions with physicists and other natural scientists is that the same mathematical laws with remarkable generalisation power are also constrained by Occam's razor. In fact, from an information-theoretic perspective the remarkable effectiveness of mathematics is a direct consequence of the effectiveness of Occam's razor. This may be partly understood from a historical perspective if one surveys the evolution of ideas in physics [10].

Given two compatible theories, Einstein generally argued that one should choose the simplest theory that yields negligible experimental error. To be precise, he stated:

It can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience.-Einstein(1933)

As the application of Occam's razor generally requires a space of computable models, and algorithmic information theory carefully explains why simpler theories generalise better [8] it is fair to say that the notion of effectiveness may be made precise. However, the theory of algorithmic information was developed in the mid 1960s by Chaitin, Kolmogorov and Solomonoff which was after Wigner wrote his article in 1960.

What is remarkable:

If we view the scientific method as an algorithmic search procedure then there is no reason, a priori, to suspect that a particular inductive bias should be particularly powerful. This much was established by David Wolpert via his No Free Lunch Theorems [11].

On the other hand, the history of natural science indicates that Occam's razor is remarkably effective. The effectiveness of this inductive bias has more recently been explored within the context of deep learning [12].

References:

Eugene Wigner. The Unreasonable Effectiveness of Mathematics in the Natural Sciences. 1960.
David Deutsch. Quantum theory, the Church–Turing principle and the universal quantum computer. 1985.
Peter D. Grünwald. The Minimum Description Length Principle . MIT Press. 2007.
A. N. Kolmogorov Three approaches to the quantitative definition of information. Problems of Information and Transmission, 1(1):1–7, 1965
G. J. Chaitin On the length of programs for computing finite binary sequences: Statistical considerations. Journal of the ACM, 16(1):145–159, 1969.
R. J. Solomonoff A formal theory of inductive inference: Parts 1 and 2. Information and Control, 7:1–22 and 224–254, 1964.
Michael Nielsen. Interesting problems: The Church-Turing-Deutsch Principle. 2004. https://michaelnielsen.org/blog/interesting-problems-the-church-turing-deutsch-principle/
Marcus Hutter et al. (2007) Algorithmic probability. Scholarpedia, 2(8):2572.
Andrew Robinson. Did Einstein really say that? Nature. 2018.
The Evolution of Physics, Albert Einstein & Leopold Infeld, 1938, Edited by C.P. Snow, Cambridge University Press
Wolpert, D.H., Macready, W.G. (1997), "No Free Lunch Theorems for Optimization", IEEE Transactions on Evolutionary Computation 1, 67.
Guillermo Valle Pérez, Chico Camargo, Ard Louis. Deep Learning generalizes because the parameter-function map is biased towards simple functions. 2019.

Best Answer

A 2013 issue of Interdisciplinary Science Reviews was entirely devoted to this topic. One viewpoint, by Jesper Lützen, struck me:

When Wigner claimed that the effectiveness of mathematics in the natural sciences was unreasonable it was due to a dogmatic formalist view of mathematics according to which higher mathematics is developed solely with a view to formal beauty. I shall argue that this philosophy is not in agreement with the actual practice of mathematics. Indeed, I shall briefly illustrate how physics has influenced the development of mathematics from antiquity up to the twentieth century. If this influence is taken into account, the effectiveness of mathematics is far more reasonable.

(the articles in this issue are behind a paywall, perhaps there is another way to access them...)