I might be telling you something you already know, but keep in mind that really
$\hat{f}(x)=\hat{f}(x,\{X_k\})$,
where $\{X_k\}$ is the set of sample points over which you build your estimate. For most non-parametric estimators, the $X_k$ are assumed independent, and the method is additive, so you can just look at the $MSE$ of $\hat{f}(x,X_k)$ and then take an average.
Then your formula is interpreted as
$MSE(\hat f(x)) = E[(\hat f(x)-f(x))^{2}]=\int_\Omega(\hat{f}(x,z)-f(x))^2f(z)dz$,
which yes, is the MSE error at the point $x$.
As for the $MSEP$, I'm not entirely sure what your question is, but there are surely various ways to predict this. If you want to know the error expected at $x^*$, then I guess it probably does line up with the $MSE$, however, you might want to know for example the prediction error for a random $X^*$, in which case you might assume it is drawn from a $f(x)$ distribution, then you would want the $MISE$.
Hope that helps clarify something.
The essence of this I do take to be statistical, rather than a pure programming problem.
As you don't use weights in your manual calculation, the lack of agreement is at first sight not surprising, but there are other problems too, some but not all trivial.
Random number generation. Although you carefully set seed
note also that Stata's random number generator changed in Stata 14 (http://www.stata.com/help.cgi?whatsnew13to14 #15). Below I am using Stata 10.1 on an old computer. This just affects which numbers are generated and should be separate from getting the same answers out of regress
and a hand-calculation. You should able to reproduce what I am doing with version control.
Using double
precision consistently. regress
uses double precision, so hand calculations should follow along.
Using weights that make sense. Your weights by construction have expectation zero, but Stata ignores negative weights, typically about half of them. We could adjust for that by comparing like with like, but it's cleaner to generate positive weights directly.
Using the same formula. This is what interests you.
The code here doesn't follow yours. In some cases I have used slightly more direct routes. Note that scalar
s hold more precision than local
s. The number of decimal places here is ridiculous for every purpose except the present purpose of reproducing a calculation.
. clear
. set seed 12345
. drawnorm x e, n(1000)
(obs 1000)
. gen y = 5+(2*x)+e
. gen double w = runiform()
. reg y x
Source | SS df MS Number of obs = 1000
-------------+------------------------------ F( 1, 998) = 3713.72
Model | 3846.76283 1 3846.76283 Prob > F = 0.0000
Residual | 1033.75252 998 1.03582417 R-squared = 0.7882
-------------+------------------------------ Adj R-squared = 0.7880
Total | 4880.51536 999 4.88540076 Root MSE = 1.0178
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | 1.963036 .0322124 60.94 0.000 1.899824 2.026248
_cons | 5.015532 .0322162 155.68 0.000 4.952313 5.078752
------------------------------------------------------------------------------
. di %23.18f e(rmse)
1.017754475398205700
. predict double err, res
. gen double sqerr = err^2
. su sqerr, meanonly
. di _n "rmse " %23.18f sqrt(r(sum)/e(df_r))
rmse 1.017754475398205700
. reg y x [aw=w]
(sum of wgt is 4.9129e+02)
Source | SS df MS Number of obs = 1000
-------------+------------------------------ F( 1, 998) = 3666.47
Model | 3913.22661 1 3913.22661 Prob > F = 0.0000
Residual | 1065.16666 998 1.06730126 R-squared = 0.7860
-------------+------------------------------ Adj R-squared = 0.7858
Total | 4978.39327 999 4.98337665 Root MSE = 1.0331
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | 1.984769 .0327783 60.55 0.000 1.920447 2.049092
_cons | 5.005039 .0326721 153.19 0.000 4.940925 5.069153
------------------------------------------------------------------------------
. di %23.18f e(rmse)
1.033102736537657100
. predict double werr, res
. su w, meanonly
. scalar wsum = r(sum)
. gen double sqwerr = (w/wsum) * e(N) * werr^2
. su sqwerr, meanonly
. di _n "rmse " %23.18f sqrt(r(sum)/e(df_r))
rmse 1.033102736537657300
I regard Stata code as being as self-evident as presumably R users feel also about R. What may help is that r(sum)
, e(N)
and e(df_r)
are saved scalar results accessible after running commands. res
is the option that instructs predict
to produce plain residuals, not predicted values.
The very minor discrepancy in the second case I trust to be trivial.
If compatibility with R or anything else continues to worry, I'd recommend a test suite not dependent on either's random number generator.
Best Answer
If you look at the Stata output:
Dividing the sum of squares of the residual (851.469) by its degrees of freedom (72) yields 11.826. That is the mean sum of squares. If you further take a square root, you'll get Root MSE (3.4289 in the output).
Basically, it's a measurement of accuracy. The more accurate model would have less error, leading to a smaller error sum of squares, then MS, then Root MSE. However, you can only apply this comparison within the same dependent variables, because MS and Root MSE are not standardized. Depending on the unit of measurements, Root MSE can vary greatly.