The essence of this I do take to be statistical, rather than a pure programming problem.
As you don't use weights in your manual calculation, the lack of agreement is at first sight not surprising, but there are other problems too, some but not all trivial.
Random number generation. Although you carefully set seed
note also that Stata's random number generator changed in Stata 14 (http://www.stata.com/help.cgi?whatsnew13to14 #15). Below I am using Stata 10.1 on an old computer. This just affects which numbers are generated and should be separate from getting the same answers out of regress
and a hand-calculation. You should able to reproduce what I am doing with version control.
Using double
precision consistently. regress
uses double precision, so hand calculations should follow along.
Using weights that make sense. Your weights by construction have expectation zero, but Stata ignores negative weights, typically about half of them. We could adjust for that by comparing like with like, but it's cleaner to generate positive weights directly.
Using the same formula. This is what interests you.
The code here doesn't follow yours. In some cases I have used slightly more direct routes. Note that scalar
s hold more precision than local
s. The number of decimal places here is ridiculous for every purpose except the present purpose of reproducing a calculation.
. clear
. set seed 12345
. drawnorm x e, n(1000)
(obs 1000)
. gen y = 5+(2*x)+e
. gen double w = runiform()
. reg y x
Source | SS df MS Number of obs = 1000
-------------+------------------------------ F( 1, 998) = 3713.72
Model | 3846.76283 1 3846.76283 Prob > F = 0.0000
Residual | 1033.75252 998 1.03582417 R-squared = 0.7882
-------------+------------------------------ Adj R-squared = 0.7880
Total | 4880.51536 999 4.88540076 Root MSE = 1.0178
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | 1.963036 .0322124 60.94 0.000 1.899824 2.026248
_cons | 5.015532 .0322162 155.68 0.000 4.952313 5.078752
------------------------------------------------------------------------------
. di %23.18f e(rmse)
1.017754475398205700
. predict double err, res
. gen double sqerr = err^2
. su sqerr, meanonly
. di _n "rmse " %23.18f sqrt(r(sum)/e(df_r))
rmse 1.017754475398205700
. reg y x [aw=w]
(sum of wgt is 4.9129e+02)
Source | SS df MS Number of obs = 1000
-------------+------------------------------ F( 1, 998) = 3666.47
Model | 3913.22661 1 3913.22661 Prob > F = 0.0000
Residual | 1065.16666 998 1.06730126 R-squared = 0.7860
-------------+------------------------------ Adj R-squared = 0.7858
Total | 4978.39327 999 4.98337665 Root MSE = 1.0331
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | 1.984769 .0327783 60.55 0.000 1.920447 2.049092
_cons | 5.005039 .0326721 153.19 0.000 4.940925 5.069153
------------------------------------------------------------------------------
. di %23.18f e(rmse)
1.033102736537657100
. predict double werr, res
. su w, meanonly
. scalar wsum = r(sum)
. gen double sqwerr = (w/wsum) * e(N) * werr^2
. su sqwerr, meanonly
. di _n "rmse " %23.18f sqrt(r(sum)/e(df_r))
rmse 1.033102736537657300
I regard Stata code as being as self-evident as presumably R users feel also about R. What may help is that r(sum)
, e(N)
and e(df_r)
are saved scalar results accessible after running commands. res
is the option that instructs predict
to produce plain residuals, not predicted values.
The very minor discrepancy in the second case I trust to be trivial.
If compatibility with R or anything else continues to worry, I'd recommend a test suite not dependent on either's random number generator.
The credible interval gets larger the further you extrapolate from the data. In this case, the data density is approximately the same throughout, so that broadening is not observed, and the credible interval is dominated by the estimate of the assumed additive noise contaminating the observations ($\sigma_\mathrm{nois}$). If you extended the plot, say from -100 to +100, I suspect you would see the broadening.
The width of the band look about right for a 95% credible interval, it has about as many data points lying outside it as would be reasonably expected.
If you expect the noise depends on the attribute/feature, you need a heteroscedastic Gaussian process model.
Best Answer
By confidence-interval, you probably mean a prediction-interval. The two terms are often confused, but yes, there is a difference.
Point predictions and prediction intervals offer different kinds of information.
Point predictions predict a single number. You can assess their quality using rms, or a lot of other accuracy KPIs, like the mse, the mae or even the mape. Which KPI is most meaningful really depends on your loss function - the mse penalizes large errors much more than the mae.
If your Bayesian method gives you an entire posterior distribution, you can extract a point forecast from it in various ways, e.g., by taking the mean, or the median, or the mode. Fun fact: if you use the rmse or the mse, take the mean - and if you use the mae, take the median. There won't be much of a difference if the posterior distribution is symmetric, but if it's asymmetric, this can indeed make a difference.
Whether you actually want a point prediction or a prediction interval is up to what you want to do with your predictions. Point predictions are easier for non-statisticians to grasp. PIs allow you to do scenario analyses; they tell you how sure you are about a prediction (if they are well calibrated).