Solved – How does random Forest work for regression

machine learningrandom forestregression

I am an absolute beginner in field of machine learning, I started doing titanic assignment in Kaggle and found(read some where) Random Forest is the best fit. I started reading about random forest and found the Explanation by Edwin Chen in this question intuitive. This made me "understand" how I can solve the Titanic assignment which predicts if one survives or not(classification). But I cannot understand How random Forest will work for regression which is continuous.

Please don't mind to point out any mistakes in my assumptions or the way I started things. Any advice would be helpful, This looks very vast and Don't even know where to begin.

Best Answer

Basically there are two differences:

  • when building the model/tree it is used a different criteria to split information; for example on binary split the purpose is to choose the split variable and split value so that the sum of variances of the two resulting data points on target/output variable is minimal
  • when you predict values you will use the mean value of the target/output variable for all data points in the leaf node

Some variants I saw:

  • for splitting you might want to minimize the sum of standard deviations, the weighed sum of variances, etc
  • for prediction values you can also use trimmed mean, median or even another model (like a linear model fitted on instances from the node)