Update
The Coursera course I recommended long ago has now gone offline, although you can find links to the slides and videos on Hinton's home page. In any case, the field has continued to advance dramatically and there are new results and more up-to-date expository work; see any of the more recent answers.
For what it's worth, in the six years since I wrote this answer, the most fruitful point of view in my own work has been to focus on the high-dimensional geometry of neural networks. There are a lot of interesting sights to see in the wilds of a world with thousands or millions of dimensions.
Old answer
If you have time, I highly recommend this Coursera course.
The videos are available for free and are truly excellent. The teacher is Geoffrey Hinton, who is one of the main players in the area, and he does an excellent job of providing both clear definitions and useful intuition.
In general, I wouldn't expect to see perfect theorem-lemma-proof exposition of deep learning anywhere, simply because the math hasn't caught up to real-world practice. More typical is a clean analysis of an idealized system, which is then related to a real system by a heuristic argument. In other words, this is an area that could use attention from mathematicians!
I'm not sure I would say I'm an expert in information geometry. However, I worked for several years on the subject as a postdoc. As a disclaimer, this is entirely my own opinion and others may disagree.
Since you asked this question, the research situation in the field has improved. Firstly, two separate books ([1$ $], [2$ $]) have been published, both of which are good references for the material. In particular, the second gives a rigorous mathematical treatment for the basic theory. Secondly, a new journal, Information Geometry, has been released. Thus far several issues have been published and they contain some interesting papers.
However, information geometry is definitely a relatively niche mathematical field. As to the reason for this, in my opinion IG is really an interdisciplinary field and not simply a branch of mathematics. Many of the people working in the field are not mathematicians by background. As a result, information geometry embodies a wide range of research. Some papers are mathematical, but many others are really statistics, computer science, or some hybrid thereof. Many of the publishing conventions are differ from math, as well. For instance, it's common to publish short papers without proofs in conference proceedings and, generally speaking, the main theorems are not stated in the introduction.
While there is a lot of good work being done in the field, there is also too much research that is not really serious. Most of this is not done in bad faith, but due to a lack of experience and background in geometry. Furthermore, a lot of the work is published in a for-profit journal whose peer review process is minimal. Without giving examples, some papers boil down to slightly modifying known results and treating them as novel. Other papers try to use really big ideas without understanding the underlying theory or really proving anything. Furthermore, what is considered acceptable overlap between publications is far greater than in pure math. Needless to say, these issues create serious problems for the field, and makes it much less likely to be taken seriously.
Even with the good papers, they often seem to lack a good punchline. As was mentioned in the comments, the math in IG has built up a very general foundational theory, often without providing mathematical or statistical motivation for this theory. My impression is that quite a few of the researchers in the field were heavily influenced by the "structural point of view" pioneered by Nomizu and Kobayashi. I suppose the motivation for these structures might be self-evident to a statistician, but as a geometer oftentimes it's completely lost on me. In my experience, I only really started to understand what was going on when I worked through some important examples of statistical manifolds, instead of trying to learn the theory from the ground up.
Related to the point above, it's difficult to find explicit conjectures in the field. There isn't something similar to Yau's list of open problems in geometry to guide progress in the field. As such, when I was learning the field it was hard to tell what was considered an important problem and to understand the motivations for the research.
As a result of all of these factors, information geometry has remained a specialized sub-field. I think this will remain the case unless it is used to solve a big problem or it evolves to be more in line with standard mathematical conventions. All that being said, I've learned a lot from information geometry, and there is definitely a fair amount of low-hanging fruit to be picked. Furthermore, the field seems to be making progress in recent years, so hopefully my critiques will soon be obsolete.
To end on a positive note, let me give an example of a paper that I think does things well [3$ $]. This work studies necessary conditions for a Riemannian manifold to locally be written as the Hessian of a convex potential. I really like this paper and have found it helpful for my intuition.
P.S. If anyone is interested, I was able to find a list of open problems from 1998, some of which have since been solved.
References
[1$ $] Amari, S. I. (2016). Information geometry and its applications (Vol. 194). Tokyo: Springer.
[2$ $] Ay, N., Jost, J., Vân Lê, H., & Schwachhöfer, L. (2017). Information geometry (Vol. 8). Berlin: Springer.
[3$ $] Amari, S. I., & Armstrong, J. (2014). Curvature of Hessian manifolds. Differential Geometry and its Applications, 33, 1-12.
Best Answer
Machine learning is a huge area, and so draws from many different parts of math. Hence, you might get multiple answers emphasizing different things.
First, the linked thread Mathematics for machine learning is about what math someone should learn before diving into machine learning. That's different from what research might be most impactful. Still, that thread mentions optimization, prob/stat, linear algebra, harmonic/fourier analysis, approximation theory, topology, embedding theory, functional analysis, and control theory. Another recent thread asked about connections between higher categories and machine learning.
Another resource that might be of interest is Data Science for Mathematicians, edited by Nathan Carter. It assumes the audience is a mathematician (at, say, the graduate student level), then gives high level treatments of:
I should disclose that I wrote one of the chapters, but don't have any financial stake in the book. I recommend it because I think it's great, and will help mathematicians who want to embrace data science in their research, teaching, or as an alternative career.
In terms of "enhancing machine learning," there are several directions, listed below in no particular order:
It might help to poke around on arXiv and find papers doing the kind of thing you're interested in, then use Google Scholar to look up other papers by the same authors, or look up their webpages and research groups. There are also folks working on these kinds of questions from outside of a university setting, like the Topos Institute. Because the number of ways to do great research that enhances machine learning is vast, it's best to pick something concrete and get to work, instead of trying to understand every possible avenue before starting. That said, one very valuable thing academic mathematicians can bring to the world of machine learning is a "big picture" view, so even as you're working on concrete problems, stop every so often to ponder big questions and think about the major issues with machine learning today, and how math could help model, streamline, explain, validate, and make predictions related to those major issues.