Solved – Mathematical Machine Learning Theory “from scratch” textbook

machine learningreferences

I am looking for textbook(s) (a single one, preferably) covering the material from Elements of Statistical Learning with good scaffolding (ESL jumps around too much for me and serves as more of a reference) and with detailed derivations and proofs of algorithms, as well as outlining with detailed, step-by-step examples how these algorithms are executed from scratch, particularly for the purpose of studying for exams for which I will not have a computer. Texts with solutions available are strongly preferred.

Edit: Some of you may be baffled as to why I would need such a text. As an example, you can view a past practice exam for my upcoming course at this link.

I have a month to learn this material, and I don't have time to dig through the 10+ machine learning books I have to try to dissect what they are saying. I have a strong preference for textbooks written by mathematicians. Too many of the machine learning books I've seen talk about a concept for a few pages and skim over all of the computational details, or they just execute everything using XYZ package and assume that everything that comes out of the package has the appropriate output. I am very skeptical of this approach, and historically, my skepticism seems to have saved me from errors.

My background is equivalent to about half of a M.S. stats program in the U.S.: probability with calculus, Casella and Berger-level stats, (general + generalized) linear models with matrices, and experimental design. I am not afraid of by-hand matrix computations, and will probably need to know how to perform these algorithms by hand.

The very closest I've seen to what I'm looking for are Andrew Ng's CS 229 notes (see here), and I'll probably be using these – but they aren't as useful as I'd like, given that I don't have solutions to the homework assignments.

I have already read the following textbooks and haven't found them sufficient for my purposes:

  • Mohri et al, Foundations of Machine Learning (close to what I want, but no solutions available)
  • Clarke et al, Principles and Theory for Data Mining and Machine Learning (no solutions available, seems to suppose a measure-theoretic background)
  • Murphy, Machine Learning (extremely dense)
  • James et al, Introduction to Statistical Learning (relies too little on theory, and too much on assuming that the R code will work – I've already spotted errors – for example here)
  • Izenman, Modern Multivariate Statistical Techniques (better than ESL, but skims over details, uses slightly nonstandard notation, and see the link to one of the questions I have above).

Are there any other books that I don't know of that would be useful for my purposes?

Best Answer

Per @Coffee's recommendation, I would recommend the text Machine Learning: A Bayesian and Optimization Perspective by Sergios Theodoridis along with Pattern Recognition by the same author.

These two texts combined are 2,000 pages total and cover everything from undergrad-level probability to linear models, and (as far as I can tell) everything covered by Elements of Statistical Learning, in addition to time series, probabilistic graphical models, deep learning, and Monte Carlo methods.

The author makes an excellent effort to make all notation clear and consistent (thank you for bolding all of your vectors!) and seems to have used carefully chosen exercises.

Having a background in probability as well as stats at the level of Casella and Berger would be extremely helpful to have before pursing these texts. There is some discussion of UMVUEs in here.