Weighted Regression – Implementing Weighted Least Squares for a Linear Model

least squareslinear modelregressionweighted-regression

Background

I have a 2-dimensional dataset $\{y_i, x_i\}_{i=1}^N$ in the coordinates $y,x$. I'm trying to fit the dataset with the trivial model
$$\tag{*}y=mx$$
where $m$ is a (scalar) parameter that has to be learned from the dataset. A simple solution is given by the least squares estimator, which chooses $m$ as
$$\hat{m}_1\triangleq \arg \min_m \sum_{i=1}^N (y_i-m x_i)^2=\frac{X'Y}{X'X}
$$

with $X\triangleq[x_1,\dots, x_N]'$, $Y\triangleq [y_1, \dots, y_N]'$ and $'$ denotes the transpose operator. This solution works fine and is extremely fast to be computed (is just the ratio of a couple of scalar products!). However, in order to improve the accuracy of the estimate, I want to focus the attention only on a subset of relevant points because I know that as a point $(y_i, x_i)$ falls far away from the origin then it becomes unreliable. Thus, I thought to trivially replace the previous estimation strategy with a weighted one, i.e. to choose $m$ as
$$
\hat{m}_2 \triangleq \arg \min_m \sum_{i=1}^N w_i (y_i-m x_i)^2
$$

where the weights are, for example, given as $w_i\triangleq 1/\sqrt{x_i^2 + y_i^2}$.

Problem

Since the specific form of the fitting model $(*)$, we have
$$
\hat{m}_2 \triangleq \arg \min_m \sum_{i=1}^N (\tilde{y}_i-m \tilde{x}_i)^2=\frac{\tilde{X}'\tilde{Y}}{\tilde{X}'\tilde{X}}=\frac{X' W Y}{X'W X}
$$

where $\tilde{X}\triangleq [\sqrt{w_1}x_1,\dots,\sqrt{w_N}x_N]'$, $\tilde{Y}\triangleq [\sqrt{w_1}y_1,\dots,\sqrt{w_N}y_N]'$ and $W\triangleq \textrm{diag}(w_1,\dots,w_N)$. The problem is that, numerically, I find $\hat{m}_2=\hat{m}_1$, thus the weights have no effect in the estimation process.

I'm not sure if I have made or not some mistakes in my calculation or my code implementation, but I make sense of this phenomena as follows. If $w_i\neq 0$, then the point $(y_i, x_i)$ is replaced with the new one $(\tilde{y}_i, \tilde{x}_i)$, which is proportional to $(y_i, x_i)$. Thus, $(\tilde{y}_i, \tilde{x}_i)$ is aligned to $(y_i, x_i)$ and, consequently, the information carried out by $(\tilde{y}_i, \tilde{x}_i)$ is the same as the one carried out by $(y_i, x_i)$. Thus, the estimation process is independent from the values of the weights.

On the other hand, I don't understand two things:

  1. assuming $w_i\neq 0$ for all $i$, I don't understand why
    $$\tag{**}\frac{X'W Y}{X' W X}=\frac{X' Y}{X'X}$$
    Is like $W$ cancels out in the division, but I cannot see why this should be true (if it is true).
  2. let's assume that $(**)$ is true. Then, how can I estimate $m$ taking into account that some dataset points should have "low influence" in the ending value of the estimate?

Best Answer

I don't get this cancellation

> x<-rnorm(100)
> y<-rnorm(100)
> w<-1/sqrt(x^2+y^2)
> lm(y~x)

Call:
lm(formula = y ~ x)

Coefficients:
(Intercept)            x  
    0.14790      0.03694  

> lm(y~x,weights=w)

Call:
lm(formula = y ~ x, weights = w)

Coefficients:
(Intercept)            x  
   0.118386    -0.005007  

and it's having the effect you want in this example

> y<-x^3
> lm(y~x)

Call:
lm(formula = y ~ x)

Coefficients:
(Intercept)            x  
   -0.04909      2.86949  

> lm(y~x,weights=w)

Call:
lm(formula = y ~ x, weights = w)

Coefficients:
(Intercept)            x  
    0.02279      2.11306  

With the weights, points further from the origin are downweighted and the slope is lower.

You will still have inference problems, because a standard assumption for weighted least squares is that the weights are independent of $Y$ conditional on $X$.

Related Question