I am thinking of using a weighted least square method and checking the meaning of this approach. In particular, I am checking how the method in lm(., weights = w)
matches the theoretical equation. The estimate can be calculated as follows:
$$ \hat{\beta} = (X'WX)^{-1}X'WY$$
I can get this value using the following code. But, I could not get the matched value of the standard error. In particular, I use
\begin{equation}
\begin{split}
Var[\hat\beta|X] &= E[(X'WX)^{-1}X'We\cdot ((X'WX)^{-1}X'We)']\\
&= E[(X'WX)^{-1}X'We\cdot e' W' X (X'WX)^{-1}]\\
&= (X'WX)^{-1}X'WE[e\cdot e'|X] W' X (X'WX)^{-1}]
\end{split}
\end{equation}
If $E[e\cdot e'|X] = \sigma^2$, I would get Case 1
on the following code. If I use $E[e\cdot e'|X] = \sigma_i^2$, I would get Case 2
. Both cases does not match the result from the r function. How do I match this value?
x0 = c(1,1,1,1,1,1,1,1,1,1)
x1 = c(1,1,1,1,1,0,0,0,0,0)
x = cbind(x0, x1)
y = c(10,9,8,7,6,5,4,3,2,1)
w = c(0.5, 0.5, 0.5, 0.5, 0.5, 1, 1, 1, 1, 1)
summary(lm(y ~ x1, weights = w)) # x1 Estimate: 5.0000 Std.Error: 1.0607
## By hand
n = length(x1)
k = ncol(x)
W = matrix(diag(w),ncol=n)
beta = solve(t(x) %*% W %*% x) %*% t(x) %*% W %*% y
mse = sum((y - t(t(beta) %*% t(x)))^2)/(n-k)
mse2 = c((y - t(t(beta) %*% t(x)))^2)
e = matrix(diag(mse2),ncol=n)
sqrt(mse * (solve(t(x) %*% W %*% x) %*% t(x) %*% W %*% t(W) %*% x %*% solve(t(x)%*%W %*% x))[2,2])
## Case1: Std.Error = 1
sqrt((solve(t(x) %*% W %*% x) %*% t(x) %*% W %*% e %*% t(W) %*% x %*% solve(t(x)%*%W %*% x))[2,2])
## Case2: Std.Error = 0.89
Best Answer
The first step is applying the mathfact $\text{var}(AX) = A\text{var}(X)A^T$ for $A$ a constant and $X$ a matrix (or vector) of random variables. Using this you get:
$ \begin{eqnarray} \text{var}(\hat{\beta}) &=& (X^TWX)^{-1}X^TW \text{var}(Y) WX(X^TWX)^{-1} \end{eqnarray} $
And, if $W$ is selected so that $\text{var}(Y) = W^{-1} \sigma^2$, then
$ \begin{eqnarray} &=& \sigma^2(X^TWX)^{-1} \end{eqnarray} $
which one can show to be the best linear unbiased estimator (BLUE) as per Gauss Markov.,
You can replicate the output from LM modifying your code as below: