Solved – What are the best books to study Neural Networks from a purely mathematical perspective

neural networksreferences

I am looking for a book that goes through the mathematical aspects of neural networks, from simple forward passage of multilayer perceptron in matrix form or differentiation of activation functions, to back propagation in CNN or RNN (to mention some of the topics).

Do you know any book that goes in depth into this theory? I've had a look at a couple (such as Pattern Recognition and Machine Learning by Bishop or Deep Learning by Goodfellow, Bengio and Courville) but still have not found a rigorous one (with exercises would be a plus). Do you have any suggestions?

Best Answer

A very good reason why there are few very rigorous books on neural networks is that, apart from the Universal Approximation theorem (whose relevance to the learning problem is vastly overrated), there are very few mathematically rigorous results about NNs, and most of them are of a negative nature. It's thus understandably unlikely that someone would decide to write a math book which contains few proofs, most of which tell you what you can't do with your fancy model. As a matter of fact, Foundations of Machine Learning by by Mehryar Mohri, Afshin Rostamizadeh and Ameet Talwalkar, a book which is second to none in terms of rigour, explicitly chooses not to cover Neural networks because of the lack of rigorous results:

https://www.amazon.com/Foundations-Machine-Learning-Adaptive-Computation/dp/0262039400/

Anyway, a few mathematical proofs (including the proof that the backpropagation algorithm computes the gradient of the loss function with respect to the weights) can be found in Understanding Machine Learning: From Theory to Algorithms, by Shai Shalev-Shwartz and Shai Ben-David:

https://www.amazon.com/Understanding-Machine-Learning-Theory-Algorithms-ebook/dp/B00J8LQU8I

Neural Network Methods in Natural Language Processing by Yoav Goldberg and Graeme Hirst is also quite rigorous, but probably not enough for you:

https://www.amazon.com/Language-Processing-Synthesis-Lectures-Technologies/dp/1627052984

Finally, Linear Algebra and Learning from Data by Gilbert Strang covers a part of the math of deep learning, which while not being the whole story, is definitely a cornerstone, i.e., linear algebra:

https://www.amazon.com/-Algebra-Learning-Gilbert-Strang/dp/0692196382


EDIT: this has recently changed with the latest advancements of Deep Learning Theory, e.g., NTK theory, new concentration of measure results, new results on Rademacher complexity and covering numbers, etc. Matus Telgarsky wrote an excellent online book on the topic:

https://mjt.cs.illinois.edu/dlt/