Solved – Weighted least squares to correct for heteroscedasticity

heteroscedasticityregressionspatial

I would like to use a weighted least squares (WLS) regression to perform tests on heteroscedastic spatial data.

Each data point represents the mean of some variable over an area, and the sample sizes between the areas vary, so intuitively the things I'm measuring are more error prone in areas with a small sample size.

The variance of a mean is inversely proportional to the sample size, so presumably I should weight the regression by the inverse of this i.e. weight each point by the sample size that point was derived from.

But do I use, in this, the sample size from the dependent variable, the sample size from the independent variable – or both?

Best Answer

It appears you know the whole technique, so I will only deal with the specific question - and the answer is: You should use the sample sizes that relate to the "explanatory" variables - I guess this is what you mean by "independent", i.e. of the regressors.

This comes out of the following: In a regression setting, we make assumptions about the error term conditional on the regressors: namely in a (matrix notation) model

$$ \mathbf y = \mathbf X\beta + \mathbf u$$

we specify $E(\mathbf u \mid \mathbf X) = 0,\; E(\mathbf u \mathbf u'\mid \mathbf X) = \sigma^2\mathbf I$. In a heteroskedastic setting, we essentially think of conditional heteroskedasticity, namely, we assume (or suspect) that $$E(\mathbf u \mathbf u'\mid \mathbf X) = \sigma^2\mathbf \Omega $$ Since the expected value is conditional on $\mathbf X$, it will be a function of $\mathbf X$, (and not of $\mathbf y$ which is included in the error term), i.e. $\mathbf \Omega = g(\mathbf X) $. So for logical consistency, you must use characteristics related to the regressors in order to theoretically model heteroskedasticity, and then apply weighted least-squares.

Related Question