Maybe Markus Khuri's RTG notes would help?
There's also a set of lecture notes by Rick Schoen for his 2009 course in Stanford on General Relativity, which has a nice discussion of the fundamental ideas involved in the proof of the PMT. I don't know if it is publich available on the internet though...
The fields you're talking about are typically concerned with two different geometric spaces:
- The space of input data to a neural network (geometric deep learning)
- The parameter space of all neural networks with a given architecture (information geometry)
Many natural applications of neural networks involve input data with a discrete Euclidean-type structure: 1D for time series, 2D for images or audio, 3D for video. That "Geometric Deep Learning" paper discusses applying neural networks to input data with other types of geometry, such as graphs and networks. A central problem is figuring out the right architecture to handle a particular type of data.
On the other hand, suppose you're studying the question of training a particular neural network. That is, you have a specific architecture in mind, let's say with $n$ weight (and maybe bias) parameters, where any given set of parameters may be viewed as a point in $\mathbb{R}^n$. When you study the dynamics of training, it can be useful to think about different metrics on this space. For example, some common regularization methods rely on $L_1$ or $L_2$ norms. The "information geometry" line of work looks at other metrics, with the goal of capturing more sophisticated concepts of network capacity, invariances to certain transformations, etc. A paper with a relatively brief, self-contained exposition is Fisher-Rao Metric, Geometry, and Complexity of Neural Networks.
To sum up: Geometric deep learning is concerned with problems where the domain, or input data, is far from being modeled by a standard Euclidean space. Information geometry is traditionally used to analyze dynamics on a neural network parameter space ($\mathbb{R}^n$, but with a non-Euclidean metric). So in that sense, they are conceptually distinct. However, they both use similar mathematics, and certainly both could arise in studying a particular neural network.
Best Answer
1
Penrose's singularity theorem is a bit of a misnomer.
Penrose never showed that there is a singularity in the spacetime. What he proved is that the spacetime cannot be timelike or null geodesically complete. As is now well understood, this does not necessarily mean there there is a singularity (in the sense of a region of extreme curvature).
A much better name for the theorem is incompleteness theorem.
2
Beyond basic differential geometric concepts that are covered in common undergraduate courses, such as the concepts of the cut/conjugate points, the key idea that is used is the Raychaudhuri equation for null geodesics, which is a specific form of the Jacobi equation for Jacobi fields along geodesics, but specialized when we consider a family of null (or in the case of the original Raychaudhuri-Laudau equations, time-like) geodesics.
Those of us familiar with the Jacobi equation understands that it says that the rate of acceleration of the separation of nearby geodesics are governed by a curvature quantity. And here is where the theorem is no longer purely geometric: the curvature quantity involved can be related by Einstein's equation to the space-time matter content, and under "reasonable assumptions" this curvature quantity can be assumed to be signed (or zero).
So this means that the presence of reasonable matter will cause nearby null geodesics to want to focus toward each other, similar to how geodesics tend to want to behave on positively curved Riemannian manifolds. So from here we see that there must be some conjugate or cut points that comes up from this focusing.
In terms of lasting mathematical impact, probably this step is the strongest for the modern mathematical GR community. What Penrose demonstrated is that one can pull out monotonicity properties for the evolution equation in a useful way, even though the equations of motion is manifestly time-symmetric. It cemented the importance of thinking about the Raychaudhuri equations (as well as the geometry of null hypersurfaces), and also lends a sort of different philosophy to what is and isn't doable in mathematical GR (this latter is a bit harder to describe).
3
The other main ingredient is a careful understanding of the causal structure of spacetime. By the arguments in the previous step, Penrose showed that the boundary of a certain space-time set is necessarily compact, due to the presence of cut and conjugate points.
A detailed examination of the causal structure of the spacetime, gives a different characterization of the same boundary. Assuming that the space-time is geodesically complete, one can prove from general principles that the same boundary must be a non-compact set.
The contradiction is what leads to a proof of incompleteness.
For someone trained in classical differential geometry, this last ingredient, the understanding of the causal geometry (which is only present in Lorentzian and not Riemannian geometry), is probably the least familiar.