I think this is a great question, and not an easy one to answer. I conceptualize that machine learning encompasses a lot of multivariate statistics, because many of the common techniques in multivariate analysis (ordination and clustering, for instance) use unsupervised learning algorithms. With people like me who aren't that concerned about the computer side of things, a lot of this stuff appears to be "under the hood", and I usually am focused more on how ordination relates as an extension of regression. But it cannot be ignored that the computer is doing some pretty advanced searching for patterns that I am not responsible for.
Then there are supervised learning techniques in machine learning outside the realm of regular multivariate analysis. For instance, if you want to predict what categories some new object would go into based upon some of its variable's values, then you can train the algorithm to a bunch of objects that you know the classification of and then set the algorithm on classifying the new object. This is clearly not a multivariate statistics technique, and I tend to think of this when I think ofmachine learning because it involves that process of communicating the success or failure of a search to the system. Then this is where machine learning starts to overlap with AI, and things quickly get completely out of my depth...
In the end, I do agree with the second answer on this thread that machine learning emphasizes prediction, whereas statisics in general is concerned with inference - but again, this is broad strokes stuff and not always going to be true.
In my understanding (though I wouldn't be surprised to be challenged on this), machine learning and statistics tackle partially similar problems, but machine learning focuses on the specific problem of prediction, and machine learning methods are most often not based on a model of the data-generating process. Secondly, statistics focuses on data-generating processes that involve randomness, while "Chaos Theory" a.k.a. nonlinear dynamics focuses on deterministic processes. Therefore, machine learning is two steps removed from nonlinear dynamics or chaos.
There is a weak relationship between machine learning and the phenomenon of chaos since both are about prediction, or rather predictability in the second case. However, chaos is about limits of predictability due to insufficient knowledge of initial conditions even though there is a perfect model of the process, while machine learning is about the practical problem of actually predicting without caring or knowing much about the underlying process.
There is also a link between chaos and statistics insofar as it can be shown that specific chaotic systems can be mapped onto random processes. The basic idea is that chaotic dynamics amplifies differences in states, which means over time more and more details of the initial conditions come to matter. If the not-infinitely-precise knowledge of initial conditions is conceptualized as random, that means the large-scale output of a chaotic system can be considered random. For more details see e.g. here. However, nonlinear dynamics tends to focus on low-dimensional dynamical systems and their even lower-dimensional attractors, while randomness in many real world situations handled by statistics and machine learning has not to do with not knowing the 100th digit of a few initial conditions, but with not knowing anything of the state of very high-dimensional influences.
I hope this helps clarify matters.
Best Answer
Computational learning, more concretely the probably approximately correct (PAC) framework, answers questions like: how many training examples are needed for a learner to learn with high probability a good hypothesis? how much computational effort do I need to learn with high probability such hypothesis? It does not deal with the concrete classifier you are working with. It is about what you can and cannot learn with some samples at hand.
In statistical learning theory you rather answer questions of the sort: how many training samples will the classifier misclassify before it has converged to a good hypothesis? i.e. how hard is it to train a classifier, and what warranties do I have on its performance?
Regretfully I do not know a source where these two areas are described/compared in an unified manner. Still, though not much hope that helps