State of Art Regression – Is This the State of Art Regression Methodology?

baggingboostingmodel-averagingpredictive-modelsstacking

I've been following Kaggle competitions for a long time and I come to realize that many winning strategies involve using at least one of the "big threes": bagging, boosting and stacking.

For regressions, rather than focusing on building one best possible regression model, building multiple regression models such as (Generalized) linear regression, random forest, KNN, NN, and SVM regression models and blending the results into one in a reasonable way seems to out-perform each individual method a lot of times.

Of course, a solid understanding of each method is the key and an intuitive story can be told based on a linear regression model, but I'm wondering if this has become the state of art methodology in order to achieve the best possible results.

Best Answer

It is well-known, at least from the late 1960', that if you take several forecasts^† and average them, then the resulting aggregate forecast in many cases will outperform the individual forecasts. Bagging, boosting and stacking are all based exactly on this idea. So yes, if your aim is purely prediction then in most cases this is the best you can do. What is problematic about this method is that it is a black-box approach that returns the result but does not help you to understand and interpret it. Obviously, it is also more computationally intensive than any other method since you have to compute few forecasts instead of single one.

_{† This concerns about any predictions in general, but it is often described in forecasting literature.}

Winkler, RL. and Makridakis, S. (1983). The Combination of Forecasts. J. R. Statis. Soc. A. 146(2), 150-157.

Makridakis, S. and Winkler, R.L. (1983). Averages of Forecasts: Some Empirical Results. Management Science, 29(9) 987-996.

Clemen, R.T. (1989). Combining forecasts: A review and annotated bibliography. International Journal of Forecasting, 5, 559-583.

Bates, J.M. and Granger, C.W. (1969). The combination of forecasts. Or, 451-468.

Makridakis, S. and Hibon, M. (2000). The M3-Competition: results, conclusions and implications. International journal of forecasting, 16(4), 451-476.

Reid, D.J. (1968). Combining three estimates of gross domestic product. Economica, 431-444.

Makridakis, S., Spiliotis, E., and Assimakopoulos, V. (2018). The M4 Competition: Results, findings, conclusion and way forward. International Journal of Forecasting.

Best Answer

Related Solutions

Solved – Proper cross validation for stacking models

Question 1: local prediction & cross validation

Question 2: combining predictions

Boosted Regression Trees – Reconciling Boosted Regression Trees, Generalized Boosted Models, and Gradient Boosting Machine

Related Question