Background
I have a 2-dimensional dataset $\{y_i, x_i\}_{i=1}^N$ in the coordinates $y,x$. I'm trying to fit the dataset with the trivial model
$$\tag{*}y=mx$$
where $m$ is a (scalar) parameter that has to be learned from the dataset. A simple solution is given by the least squares estimator, which chooses $m$ as
$$\hat{m}_1\triangleq \arg \min_m \sum_{i=1}^N (y_i-m x_i)^2=\frac{X'Y}{X'X}
$$
with $X\triangleq[x_1,\dots, x_N]'$, $Y\triangleq [y_1, \dots, y_N]'$ and $'$ denotes the transpose operator. This solution works fine and is extremely fast to be computed (is just the ratio of a couple of scalar products!). However, in order to improve the accuracy of the estimate, I want to focus the attention only on a subset of relevant points because I know that as a point $(y_i, x_i)$ falls far away from the origin then it becomes unreliable. Thus, I thought to trivially replace the previous estimation strategy with a weighted one, i.e. to choose $m$ as
$$
\hat{m}_2 \triangleq \arg \min_m \sum_{i=1}^N w_i (y_i-m x_i)^2
$$
where the weights are, for example, given as $w_i\triangleq 1/\sqrt{x_i^2 + y_i^2}$.
Problem
Since the specific form of the fitting model $(*)$, we have
$$
\hat{m}_2 \triangleq \arg \min_m \sum_{i=1}^N (\tilde{y}_i-m \tilde{x}_i)^2=\frac{\tilde{X}'\tilde{Y}}{\tilde{X}'\tilde{X}}=\frac{X' W Y}{X'W X}
$$
where $\tilde{X}\triangleq [\sqrt{w_1}x_1,\dots,\sqrt{w_N}x_N]'$, $\tilde{Y}\triangleq [\sqrt{w_1}y_1,\dots,\sqrt{w_N}y_N]'$ and $W\triangleq \textrm{diag}(w_1,\dots,w_N)$. The problem is that, numerically, I find $\hat{m}_2=\hat{m}_1$, thus the weights have no effect in the estimation process.
I'm not sure if I have made or not some mistakes in my calculation or my code implementation, but I make sense of this phenomena as follows. If $w_i\neq 0$, then the point $(y_i, x_i)$ is replaced with the new one $(\tilde{y}_i, \tilde{x}_i)$, which is proportional to $(y_i, x_i)$. Thus, $(\tilde{y}_i, \tilde{x}_i)$ is aligned to $(y_i, x_i)$ and, consequently, the information carried out by $(\tilde{y}_i, \tilde{x}_i)$ is the same as the one carried out by $(y_i, x_i)$. Thus, the estimation process is independent from the values of the weights.
On the other hand, I don't understand two things:
- assuming $w_i\neq 0$ for all $i$, I don't understand why
$$\tag{**}\frac{X'W Y}{X' W X}=\frac{X' Y}{X'X}$$
Is like $W$ cancels out in the division, but I cannot see why this should be true (if it is true). - let's assume that $(**)$ is true. Then, how can I estimate $m$ taking into account that some dataset points should have "low influence" in the ending value of the estimate?
Best Answer
I don't get this cancellation
and it's having the effect you want in this example
With the weights, points further from the origin are downweighted and the slope is lower.
You will still have inference problems, because a standard assumption for weighted least squares is that the weights are independent of $Y$ conditional on $X$.