Weighted Least Squares – Understanding Standard Error Calculations in R

rstandard errorweighted-regression

I am thinking of using a weighted least square method and checking the meaning of this approach. In particular, I am checking how the method in lm(., weights = w) matches the theoretical equation. The estimate can be calculated as follows:

$$ \hat{\beta} = (X'WX)^{-1}X'WY$$

I can get this value using the following code. But, I could not get the matched value of the standard error. In particular, I use

\begin{equation}
\begin{split}
Var[\hat\beta|X] &= E[(X'WX)^{-1}X'We\cdot ((X'WX)^{-1}X'We)']\\
&= E[(X'WX)^{-1}X'We\cdot e' W' X (X'WX)^{-1}]\\
&= (X'WX)^{-1}X'WE[e\cdot e'|X] W' X (X'WX)^{-1}]
\end{split}
\end{equation}

If $E[e\cdot e'|X] = \sigma^2$, I would get Case 1 on the following code. If I use $E[e\cdot e'|X] = \sigma_i^2$, I would get Case 2. Both cases does not match the result from the r function. How do I match this value?

x0 = c(1,1,1,1,1,1,1,1,1,1)

x1 = c(1,1,1,1,1,0,0,0,0,0)

x = cbind(x0, x1)

y = c(10,9,8,7,6,5,4,3,2,1)

w = c(0.5, 0.5, 0.5, 0.5, 0.5, 1, 1, 1, 1, 1)

summary(lm(y ~ x1, weights = w)) # x1 Estimate: 5.0000     Std.Error: 1.0607

## By hand

n = length(x1)

k = ncol(x)

W = matrix(diag(w),ncol=n)

beta = solve(t(x) %*% W %*% x) %*% t(x) %*% W %*% y

mse = sum((y - t(t(beta) %*% t(x)))^2)/(n-k)

mse2 = c((y - t(t(beta) %*% t(x)))^2)

e = matrix(diag(mse2),ncol=n)

sqrt(mse * (solve(t(x) %*% W %*% x) %*% t(x) %*% W %*% t(W) %*% x %*% solve(t(x)%*%W %*% x))[2,2]) 
## Case1: Std.Error = 1

sqrt((solve(t(x) %*% W %*% x) %*% t(x) %*% W %*% e %*% t(W) %*% x %*% solve(t(x)%*%W %*% x))[2,2]) 
## Case2: Std.Error = 0.89

Best Answer

The first step is applying the mathfact $\text{var}(AX) = A\text{var}(X)A^T$ for $A$ a constant and $X$ a matrix (or vector) of random variables. Using this you get:

$ \begin{eqnarray} \text{var}(\hat{\beta}) &=& (X^TWX)^{-1}X^TW \text{var}(Y) WX(X^TWX)^{-1} \end{eqnarray} $

And, if $W$ is selected so that $\text{var}(Y) = W^{-1} \sigma^2$, then

$ \begin{eqnarray} &=& \sigma^2(X^TWX)^{-1} \end{eqnarray} $

which one can show to be the best linear unbiased estimator (BLUE) as per Gauss Markov.,

You can replicate the output from LM modifying your code as below:

fit <- lm(y ~ x1, weights = w)
summary(fit) # x1 Estimate: 5.0000     Std.Error: 1.0607
sigma2 <- sum(w*fit$residuals^2)/{length(x0)-2}
xtwx <- t(x)%*%diag(w)%*%x
diag(solve(xtwx)*sigma2)^.5
Related Question