Solved – How to handle very different weights in a least squares fit

least squaresrobustweighted-regression

I'm performing a weighted linear least squares fit, where the weights correspond to the number of counts of a specific observation. Due to the nature of the data, it is possible that a small handful of observations get weighted much more than the rest, so that the regression is dominated by, say, two data points only, which obviously leads to bad results.

I've run a few tests and I noticed that by e.g. capping the weights, I can improve the results, though this feels rather hack-ish to me. Are there better ways to avoid a small number of data points to out-weigh the rest of the data, or, if capping the weights is a good approach, is there non-ad hoc way to determine the optimal value of the cap?

EDIT

Here's what some sample data and fits look like with (a) no weights, and (b) weights. The data were generated from fairly realistic simulations, so I know the ground truth (red line).

enter image description here
enter image description here

My problem is that the weight of some data points (#2 at x=10 for example) can be sufficiently large to dominate the fit. However, I also don't want the very-low-count data to weigh in too much, otherwise I get a really crappy fit as well.

Best Answer

You can try applying a function to the weight. Reasonable choices would be a log or sigmoid.