Solved – the difference between Econometrics and Machine Learning

econometricsmachine learning

In my understanding, econometrics estimates partial (ceteris paribus) correlations with the aim to primarily estimate causal relations. For that, it normally uses the whole dataset that is available. Econometrics can be parametric and non-parametric.

Meanwhile, machine learning is not interested in causality, but in "fit" with the aim of primarily produce predictions. For that, it normally splits the dataset between the training and the prediction sets. Machine learning can also be parametric and non-parametric.


This is what I can make of the core of these two disciplines, but I am sure there is plenty more to it. I am primarily interested in their differences. Can anyone provide a good guide on this please?

Best Answer

First things first. Everything that I say is my understanding only. Hence, as usual, I can be wrong.

Henry is partially right. But Econometrics is also a family of methods. There are a variety of different econometric methods that can be applied depending on the research question at hand as well as the data provided (cross section vs. panel data and so on).

Machine learning in my understanding is a collection of methods which enables machines to learn patterns from past observations (oftentimes in a black box manner). Regression is a standard tool in econometrics as well as machine learning as it allows to learn relationships between variables and to extrapolate these relationships into the future.

Not all econometricians are interested in a causal interpretation of parameters estimates (they rarely can claim a causal interpretation if observational data (non experimental) is used). Oftentimes, like in the case of time series data, econometricians also do only care about predictive performance.

Essentially both are the very same thing but developed in different sub-fields (machine learning being rooted in computer science). They are both a collection of methods. Econometricians also increasingly use machine learning methods like decision trees and neural networks.

You already touched a very interesting point: Causality. Essentially, both fields would like to know the true underlying relationships but as you already mentioned, oftentimes the predictive performance is the main KPI used in machine learning tasks. That is, having a low generalization error is the main goal. Of course, if you know the true causal relationships, this should have the lowest generalization error out of all possible formulations. Reality is very complex and there is no free hunch. Hence, most of the time we have only partial knowledge of the underlying system and sometimes can't even measure the most important influences. But we can use proxy variables that correlate with the true underlying variables we would like to measure.

Long story short and very very superficial: Both fields are related whereas econometricians are mostly interested in finding the true causal relationships (that is, testing some hypothesis) whereas machine learning is rooted rather in the computer science and is mostly interested in building systems with low generalization error.

PS: Using only the whole data set in econometrics should be generally avoided too. Econometricians are getting more aware that relationships found insample do not necessarily generalize to new data. Hence, replication of econometric studies is and always was very important.

Hope this helps in any way.