Solved – Effective sample size of weighted regression

degrees of freedomregressionrobustweighted-regression

I am doing a basic linear regression with one predictor with some weighting in R, e.g.,:

lm(response~explanatory, weights=w, data=mydata)

The weights are measures of precision of the response variable. Because the weights are highly skewed, I find I get very small p-values for the effect of my explanatory variable but really it is just driven by a few points. Many points in the model are not doing much (i.e., they have a small weight) but they still contribute to the degrees of freedom. Is there an alternative way to calculate the denominator degrees of freedom in weighted regression?

Best Answer

Strictly speaking, weighted linear regression only produces valid results if the weights are without error. That doesn't sound very surprising, I suppose, but it means, among other things, that the weights are known a-priori--i.e., not estimated from your data (or really estimated from any data). This is almost never the case in practice. However, unless you have very few data and the weights are very inaccurate, WLS tends to perform well.

I gather you are worried that your results are driven by only a few data points. This is a reasonable concern. There are several options for exploring this possibility. They fall into two categories:

  1. You could run alternative analyses and see if you get results that are sufficiently similar. Two alternative analyses that stand out to me are to use robust regression, and to use unweighted least squares with sandwich standard errors. But which alternative analyses you use must be chosen based on the reason you needed the WLS approach in this case, so those may not be the right options in this instance. I outline a large number of possible analyses in my answer here: Alternatives to one-way ANOVA for heteroscedastic data.

  2. You could simulate the situations that you are concerned about and see how they perform. You would probably want to simulate from several possible worlds, including possibilities where the weights were correct and incorrect, and where the null hypothesis was true or false.

For the record, to answer your explicit question directly, I do not know of alternative ways to calculate the degrees of freedom in WLS regression. However, I wonder if there can really be multiple differing ways to calculate the degrees of freedom that are all valid; I would suspect not as this seems like a reductio ad absurdum.

Related Question