First question

When you consider the prediction error through bias-variance decomposition you face the well-know compromise between how well you fit on average (bias) and how stable is you prediction model (variance).

One can improve the error by decreasing the variance or by increasing the bias. Either way you choose, you might loose at the other hand. However, one can hope that overall you improve your prediction error.

Random Forests use trees which usually overfit. The trees used in RF are fully-grown (no pruning). The bias is small since they capture locally the whole details. However fully-grown trees have few prediction power, due to high variance. Variance is high because the trees depends too much on any piece of data. One observation added/removed can give a totally different tree.

Bagging (many averaged trees on bootstrap samples) have the nice result that it creates smoother surfaces (the RF is still a piece-wise model, but the regions produced by averaging are much smaller). The surfaces resulted from averaging are smoother, more stable and thus have less variance. One can link this averaging with Central Limit Theorem where one can note how variance decrease.

A final note on RF is that the averaging works well when the trees resulted are independent. And Breiman et all. realized that by choosing randomly some of the input features will increase the independence between the trees, and have the nice effect that the variance decrease more.

Boosting works the other way, regarding bias-variance decomposition. Boosting starts with a weak-model which have low variance and high bias. The same trees used in boosting are not grown to many nodes, only to a small limit. Sometimes works fine even with decision stumps, which are the simplest form of a tree.

So, boosting starts with a low-variance tree with high bias (learners are weak because they do not capture well the structure of the data) and works gradually to improve bias. MART (I usually prefer the modern name of Gradient Boosting Trees) improve bias by "walking" on the error surface in the direction given by gradients. They usually loose in variance, on this way, but, again, one can hope that the loose in variance is well-compensated by gain in bias term.

As a conclusion these two families (bagging and boosting) are not at all the same. They start their search for the best compromise from opposite ends and works in opposite directions.

Second question

Random Forests uses CART trees, if you want to do it by the book. However were reported very small differences if one uses different trees (like C4.5) with various splitting criteria. As a matter of fact the Weka's implementation of RF uses C4.5 (the name is J4.8 in Weka library). The reason is probably the computation time, caused by smaller trees produced.

## Best Answer

Yes, you are correct.

Random Forest is more generic and involves regression as well as classification tasks. (It can also be applied to other work like survival analysis or ranking) Regression forest is a shorter way of saying "

random forests for regression", Breiman himself does use that term in his 2001 original paper on "Random Forests" so it is, by definition, correct.The earliest use of the term "

regression forest" outside Breiman's work seems to be in "Using protein expressions to predict survival in clear cell renal carcinoma" (2004) by Kim et al.