Regression – Understanding Terminology: Regression Forest, Random Forest, Decision Tree, and Regression Tree

machine learningrandom forestregressionsupervised learningterminology

I am confused about the terminology of "regression forest", "random forest regression", "random forest", "decision tree" and "regression tree".

As far as I understood, random forest is a general term that can be used for both binary and continuous outcome variables.
For example, in this paper the term "random forest" is used for the prediction of a continuous variable: https://link.springer.com/article/10.1007/s11136-020-02667-3

However, there are also the terms "regression forest" and "random forest regression", which are, as I understand, more specific terms for random forests that are applied to a continuous outcome variable. For example, in the grf Package, Athey et al. call the function that predicts a continuous outcome a "regression forest". https://projecteuclid.org/journals/annals-of-statistics/volume-47/issue-2/Generalized-random-forests/10.1214/18-AOS1709.full

I suppose that the same logic holds for decision trees and regression trees (i.e. decision trees being a more general term and regression trees being a subset of decision trees).

So,

random forest = general term for an ensemble method that combines multiple decision trees

regression forest = general term for an ensemble method that combines multiple regression trees

Is this correct?

Best Answer

Yes, you are correct.

Random Forest is more generic and involves regression as well as classification tasks. (It can also be applied to other work like survival analysis or ranking) Regression forest is a shorter way of saying "random forests for regression", Breiman himself does use that term in his 2001 original paper on "Random Forests" so it is, by definition, correct.

The earliest use of the term "regression forest" outside Breiman's work seems to be in "Using protein expressions to predict survival in clear cell renal carcinoma" (2004) by Kim et al.