Solved – How to find the weight of the weighted least squares regression analysis

heteroscedasticityregressionweights

As the title, I am having trouble to the find weight at the weighted least squares estimation.

I found that some people use weights like

wts <- 1/fitted(lm(abs(residuals(regmodel.1)) ~ x))^2

or

wts <- 1/fitted(lm(y~x))

or

wts <- 1/fitted(lm(y~x))^2

Where the 3 wts come from?

How can I find the weights?

Best Answer

Those weights ($w_i$ that are based on the predicted values $\hat{y}_i$) relate to quasi-likelihood estimates for generalized linear models (GLM). In those quasi-likelihood cases you take the freedom (at the cost of an exact likelihood computation) to only express the relation between the mean and variance, rather than fully specifying the specific error distribution.

E.g. for Poisson regression or binomial regression the mean and variance are implicitly equal, $\mu = V$, which is a too strong restriction when the model for the errors is not exactly Poisson or Binomial (for instance it can be, instead, some over-dispersed case of the Poisson or Binomial distribution). With quasi models you do not 'care' about the (exact) distribution and just define explicitly $\mu = c V$ (that multiplicative factor makes it less restrictive) and pretend solving a real likelihood function as if it was for an exactly known distribution.

By adjusting weights according to some function of the (predicted) outcome $w_i = 1/f(\hat{y}_i)$ you are correcting hetero-scedasticity like using a relation between the mean and variance $V \propto f(\mu)$, but without knowing the exact distribution.

The second case $w_i = 1/\hat{y}_i$ you might use if you expect/assume (overdispersed) Poisson, Binomial, Chi-squared (and there's possible more) error distributions (which have a linear relation $\mu = c V$ between mean and variance).

The third case $w_i = (1/\hat{y}_i)^2$ you might use for (overdispersed) exponentially distributed errors. (which have a quadratic relation $\mu = c V^2$ between mean and variance).

The first case, which seems to use some linear function of the absolute residuals, allows a much more flexible, approximation and is (I guess) used on an ad-hoc basis for more 'stranger' distributions.

You could see it as approximating the error distribution by a Gaussian distribution with $\epsilon_i \sim N(0,f(Y_i))$