Machine Learning – Is Data-Driven Modelling the Same as Machine Learning?

machine learningmodelingterminology

I read a lot of publications about data-driven modeling and machine learning. Most of them use the term interchangeably. So, is data-driven modelling and machine learning actually the same thing? If not, what are examples of data-driven models which are not considered machine learning? Or, what are examples of machine learning models which are not data-driven?

Thanks!

Best Answer

The term "machine learning" is somewhat a term of art, but it generally refers to the construction of algorithms that "learn through experience". The requirement of learning through experience necessitates data, and so machine learning is necessarily "data-driven" --- after all, if not from data, what else would it learn from?

When we refer to a "model" in statistics or machine learning, we really just mean a set of assumptions that describe the presumed probabilistic process for the data, and the logical consequences of the assumptions (e.g., resulting distributions of statistics, estimators, etc.). Even very broad forms of non-parametric models are considered "models", so it encompasses a lot. It is difficult to conceive of how you could generate a machine learning algorithm without some assumptions about the generative process for the data, and consequently, one can probably broadly use the term "modelling" for any machine learning process. One might quibble with this, since some machine learning algorithms are broad non-parametric methods, but even here we usually called these "models", and consequently, I think it is reasonable to say that machine learning methods are built on "models". Even such simple methods as least-squares estimation are built on underlying statistical models.

There may certainly be situations in machine learning where an algorithm is built, and even deployed, without regard to setting underlying probabilistic assumptions. If the algorithm is sufficiently adaptive (in the sense that most non-parametric models are). In this case one could argue that the algorithm is "model-free" insofar as it was created without regard to any model. Even then, and even if the algorithm works well in a wide class of situations, one will still tend to find that there are cases where it works well and cases where it works badly. Consequently, subsequent analysts will usually be able to figure out the kinds of assumptions required to ensure that the algorithm works well when deployed in a situation. In this case, the "modelling" gradually catches up to the initial "model-free" creation of the algorithm as we begin to learn more about the situations where the algorithm works well or badly. So you could call some machine-learning algorithms "model-free" in one sense, but modelling catches us up in the end.

In view of these considerations, I think it is reasonable to say that all machine learning involves data-driven modelling. Of course, it is possible to do data-driven modelling without using a computer algorithm at all (e.g., calculation by pen and paper), and in these cases we would not usually call that "machine learning".