Let's say we have data that looks like this:
set.seed(1)
b0 <- 0 # intercept
b1 <- 1 # slope
x <- c(1:100) # predictor variable
y <- b0 + b1*x + rnorm(n = 100, mean = 0, sd = 200) # predicted variable
We fit a simple linear model:
mod.1 <- lm(y~x)
summary(mod.1)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 26.3331 36.3795 0.724 0.471
# x 0.9098 0.6254 1.455 0.149
b0.est <- summary(mod.1)$coefficients[1,1]
b1.est <- summary(mod.1)$coefficients[2,1]
And a model where we (1) subtract off the intercept term fit in the first model from the dataset and (2) prevent the intercept term from being fit (or in other words, force the model through zero):
mod.2 <- lm(y - b0.est ~ 0 + x)
summary(mod.2)
# Estimate Std. Error t value Pr(>|t|)
# x 0.9098 0.3088 2.946 0.00401 **
b1.est.2 <- summary(mod.2)$coefficients[1,1]
As to be expected the slope parameter stays the same (0.9098).
However, while the slope parameter was not significant in the first model, it is in the second model (the standard error on the estimate in the second model is much lower than in the first model, 0.3088 vs. 0.6254).
The data is the same shape in both models with the same slope parameter being estimated by the two models. How is it the second model is so much more "certain" of the slope parameter estimate?
Or to put it another way, how are these standard errors calculated?
Using the equation for standard error I found here, I calculated the standard errors for model 1 and 2 this way:
# Model 1
DF <- length(x)-2
y.est <- b0.est + b1.est*x
numerator <- sqrt(sum((y - y.est)^2)/DF)
denominator <- sqrt(sum((x - mean(x))^2))
numerator/denominator
# SE = 0.6254
This matches the R output.
# Model 2
DF <- length(x)-1
y.est <- b1.est.2*x
numerator <- sqrt(sum((y - (y.est+b0.est))^2)/DF)
denominator <- sqrt(sum((x - mean(x))^2))
numerator/denominator
# SE = 0.6223
This doesn't match the R output which has the SE = 0.3088.
What am I missing?
Best Answer
The formulas are the same as always, so let's focus on understanding what's going on.
Here is a small cloud of points. Its slope is uncertain. (Indeed, the coordinates of these points were drawn independently from a standard Normal distribution and then moved a little to the side, as shown in subsequent plots.)
Here is the OLS fit. The intercept is near $3$. That's kind of an accident: the OLS line must pass through the center of mass of the point cloud and where the intercept is depends on how far I moved the point cloud away from the origin. Due to the uncertain slope and the relatively large distance the points were moved to the right, the intercept could be almost anywhere. To illustrate, the slopes of the dashed lines differ from the fitted line by up to $\pm 1/2$. All of them fit the data pretty well.
After lowering the cloud by the height of the intercept, the OLS line (solid gray) goes through the origin, as expected.
The OLS line remains just as uncertain as it was before. The standard error of its slope is high. But if you were to constrain it to pass through the origin, the only wiggle room left is to vary the other end up and down through the point cloud. The dotted lines show the same range of slopes as before: but now the extreme ones don't go anywhere near the cloud. Constraining the fit has greatly increased the certainty in the slope.