Solved – Calculate MSE for random forest in R using package ‘randomForest’

cross-validationmserrandom forest

I'm using randomForest to fit a model with continuous response variable. I was reading the An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics), in Chapter 8, section 8.3.3 Bagging and Random Forests, it uses the following example:
randomforest_code

I followed the exact same command and got the following result:
enter image description here

The MSE from summary is 14.5. The textbook then used the following formula to calculate MSE in test set:
enter image description here

Instead of the test set, I used this formula to calculate the MSE for the training set (the set I used to obtain the model), and here's my code:
enter image description here

However, as you can see, the outcome is way different from the result from summary statistics. I'm not sure why this happens, and which one is the correct MSE, and if I want to use MSE to compare to compare with other models, which MSE should I use?

Thank you in advance.

Best Answer

The textbook is comparing the random forest predicted values against the real values of the test data. This makes sense as a way to measure how well the model predicts: compare the prediction results to data that the model hasn't seen.

You're comparing the random forest predictions to a specific column of the training data. I don't understand why this would be helpful: you're comparing predictions for observations in the training set to some value of observations in the test set. That's like taking the average height of yourself and your neighbor. Sure, you can compute it, but how is that helpful?

Related Question