I'm trying to expand my understanding of Linear regression and to that end I'm looking at calculating a Linear regression exercise by hand.
Using some dummy data
x <- c(17,13,12,15,16,14,16,16,18,19)
y <- c(94,73,59,80,93,85,66,79,77,91)
model.test <- lm(y ~ x)
summary(model.test)
The output gives me:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 30.104 23.824 1.264 0.242
x 3.179 1.514 2.100 0.069
Residual standard error: 9.859 on 8 degrees of freedom
Multiple R-squared: 0.3553, Adjusted R-squared: 0.2747
F-statistic: 4.409 on 1 and 8 DF, p-value: 0.06895
I can :
- manually calculate the estimates of the intercept and x ok
- I understand how to calculate the t value (i.e. estimate/Std. Error)
- I also understand how to perform the hypothesis test to obtain a p-value for the Pr(>|t|) column.
My question is how can I calculate the two values in the Std. Error column?
For good measure I had a look at the lm.R code and I can see:
se <- sqrt(diag(R) * resvar)
resvar <- rss/rdf
rss <-sum(w * r^2)
rdf <- df[2L]
It looks like the formula is contained within sw, however I can't quite figure out what is happening as part of the terms resvar(residial variance?) or the square root of a matrix R from what I can gather.
Thanks in advance
Jonathan
PS I found a related post however it does not contain an answer -> how to manually calculate SE of coeficient from regress data outputs
PPS Manual workings below
observation X Y (x-x_mean) (y-y_mean) (x-x_mean)*(y-y_mean) (x-x_mean)^2 (y-y_mean)^2 y_hat y(hat) - y y(hat) - y^2)
1 17 94 1.4 14.3 20.02 1.96 204.49 84.0024336 -9.997566399 99.9513339
2 13 73 -2.6 -6.7 17.42 6.76 44.89 71.32039595 -1.679604051 2.821069768
3 12 59 -3.6 -20.7 74.52 12.96 428.49 68.14988654 9.149886536 83.72042362
4 15 80 -0.6 0.3 -0.18 0.36 0.09 77.66141478 -2.338585225 5.468980855
5 16 93 0.4 13.3 5.32 0.16 176.89 80.83192419 -12.16807581 148.062069
6 14 85 -1.6 5.3 -8.48 2.56 28.09 74.49090536 -10.50909464 110.4410701
7 16 66 0.4 -13.7 -5.48 0.16 187.69 80.83192419 14.83192419 219.9859751
8 16 79 0.4 -0.7 -0.28 0.16 0.49 80.83192419 1.831924188 3.355946231
9 18 77 2.4 -2.7 -6.48 5.76 7.29 87.17294301 10.17294301 103.4887696
10 19 91 3.4 11.3 38.42 11.56 127.69 90.34345243 -0.656547573 0.431054716
Total 156 797 3.55271E-15 -2.84217E-14 134.8 42.4 1206.1 795.6372042 -1.362795772 777.7266929
Mean 15.6 79.7 3.55271E-16 -2.84217E-15 13.48 4.24 120.61 79.56372042 -0.136279577 77.77266929
Std Dev 2.170509413 11.57631682 2.170509413 11.57631682 26.01030565 4.845341405 135.6597115 6.881620524 9.294807222 74.37605266
Variance 4.711111111 134.0111111 4.711111111 134.0111111 676.536 23.47733333 18403.55733 47.35670104 86.39344129 5531.797209
Best Answer
From doing some additional digging I found the Standard error of the parameter b can be obtained using the following formula:
The Standard error of the intercept a can be found using the following formula:
The Std error for b computed manually is 1.514 (which looks the same as the R regression out) The Std error for a computed manually is 23.827 (which is out by 0.003 I can only put this down to a rounding error on my behalf)
References:
[1] http://stattrek.com/regression/slope-confidence-interval.aspx?Tutorial=AP
[2] http://courses.ncssm.edu/math/Talks/PDFS/Standard%20Errors%20for%20Regression%20Equations.pdf