[Math] Deep learning / Deep neural nets for mathematician

applicationsapplied-mathematicsmachine learningreference-request

I am interested in finding out the math ideas behind the technologies that are under the umbrella of "Deep Learning" or "Deep neural nets".

Most of the papers/books that are often quoted in papers/online as references are not written in a very math-friendly manner. I am specifically referring to the fact that this field is highly interdisciplinary, and the language used (e.g. 'levels', 'stacking networks') are not standard mathematical terminology, but rather very specialized terms.

So I am writing this post to find out if there exists a book or review article written for pure mathematicians about the core mathematical ideas of the whole deep-learning thing.

My hope is that is there is a reference that follows (sort of ) the theorem-lemma-proof format or at least tries to where ever possible, or at least gives some rigorous definitions so that I can make sense of the terminology.

Thank you.

Best Answer

Update

The Coursera course I recommended long ago has now gone offline, although you can find links to the slides and videos on Hinton's home page. In any case, the field has continued to advance dramatically and there are new results and more up-to-date expository work; see any of the more recent answers.

For what it's worth, in the six years since I wrote this answer, the most fruitful point of view in my own work has been to focus on the high-dimensional geometry of neural networks. There are a lot of interesting sights to see in the wilds of a world with thousands or millions of dimensions.

Old answer

If you have time, I highly recommend this Coursera course.

The videos are available for free and are truly excellent. The teacher is Geoffrey Hinton, who is one of the main players in the area, and he does an excellent job of providing both clear definitions and useful intuition.

In general, I wouldn't expect to see perfect theorem-lemma-proof exposition of deep learning anywhere, simply because the math hasn't caught up to real-world practice. More typical is a clean analysis of an idealized system, which is then related to a real system by a heuristic argument. In other words, this is an area that could use attention from mathematicians!