My opinion is that it depends on which subarea of machine learning interests you. Unfortunately, at this point, much of the relevant literature (especially for theory) exists only in publications, rather than books. But this question is just about where to start, I suppose.
The more popular, "practically oriented" undergraduate targeting books like Hastie, The Elements of Statistical Learning, or Bishop, Pattern Recognition and Machine Learning are essentially non-mathematical. Books that target the probabilistic model point of view, such as the ones by Murphy and Bach (Machine Learning: A Probabilistic Perspective) and Koller et al (Probabilistic Graphical Models: Principles and Techniques) have a bit more mathematical content, mostly in the area of Bayesian modelling and applied probability (e.g., MCMC, variational inference). I think books in these categories are great introductions to ML, but perhaps not its mathematics.
The most popular book as of writing, Goodfellow et al's Deep Learning, is also non-rigorous and generally mathematically light. However, it does cover more advanced subjects at the end and it is such a comprehensive introduction to the subject that I still recommend it as a starting point.
Classical ML theory is (to a decent extent) concerned with the Probably Approximately Correct (PAC) framework. Two lovely books that focus on basic theory of introductory ML and are mathematically oriented are Shalev-Schwartz and Ben-David, Understanding Machine Learning, and Mohri et al, Foundations of Machine Learning. These are probably good starting points for people interested in starting ML theory, in terms of error bounds, sample complexities, etc... with plenty of theorems.
Specialized books in particular ML topics can be mathematically demanding as well. Schölkopf et al's Learning with Kernels and Rasmussen et al's Gaussian Processes for Machine Learning are, in my opinion, examples of these. There's also the book Information Theory, Inference and Learning Algorithms by Mackay, which covers neural networks from an information theoretic and compression point of view, and Graphical Models, Exponential Families, and Variational Inference by Wainwright and Jordan.
One short-coming of the ML literature, as of writing, is the lack of introductory books helping people access the more mathematically demanding advanced literature (e.g., the game and information theory, and optimal transport concepts used to analyze deep generative models; differential geometry and spectral methods in manifold learning and Riemannian optimization for deep learning). Hopefully one day there will be more expository material to help introduce us to these more mathematically intensive areas.
In my answer to this question, I link to other questions on the same topic, incidentally.
Best Answer
Try searching for topics concerning "Learning Dynamical Systems" and "Predictive State Representation." Here is a possible reference.
I'm not sure Ergodicity in full generality is useful. If you're dealing with a Markov chain with a stationary distribution then there will be some ergodicity involved concerning the visitation of the steady states. Also, there's something called "Ergodic Time Series" which might be interesting to you.