Solved – the “root MSE” in Stata

linear modelmseregressionstata

I have a question that has been confusing me ever since I took econometrics last year. What does the "root MSE" mean in Stata output when you regress a OLS model?

I know that it translates into "root mean squared error", but which variable's mean squared error is it after all, and how is it calculated? Can anybody provide a precise definition and formula, and explain why it is helpful to have that value?

Best Answer

  1. Calculate the difference between the observed and predicted dependent variables
  2. Square them
  3. Add them up, this will give you the "Error sum of squares," SS in Stata output
  4. Divide it by the error's degrees of freedom, this will give you the "Mean error sum of squares," MS in Stata output
  5. Take a square root of it, and this is the Root MSE
  6. Done

If you look at the Stata output:

. sysuse auto, clear
(1978 Automobile Data)

. reg mpg weight

      Source |       SS       df       MS              Number of obs =      74
-------------+------------------------------           F(  1,    72) =  134.62
       Model |   1591.9902     1   1591.9902           Prob > F      =  0.0000
    Residual |  851.469256    72  11.8259619           R-squared     =  0.6515
-------------+------------------------------           Adj R-squared =  0.6467
       Total |  2443.45946    73  33.4720474           Root MSE      =  3.4389

------------------------------------------------------------------------------
         mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      weight |  -.0060087   .0005179   -11.60   0.000    -.0070411   -.0049763
       _cons |   39.44028   1.614003    24.44   0.000     36.22283    42.65774
------------------------------------------------------------------------------

Dividing the sum of squares of the residual (851.469) by its degrees of freedom (72) yields 11.826. That is the mean sum of squares. If you further take a square root, you'll get Root MSE (3.4289 in the output).

Basically, it's a measurement of accuracy. The more accurate model would have less error, leading to a smaller error sum of squares, then MS, then Root MSE. However, you can only apply this comparison within the same dependent variables, because MS and Root MSE are not standardized. Depending on the unit of measurements, Root MSE can vary greatly.