Given two sets of data:
- Total usage of a few hundred products across the population as a whole
- Per-consumer usage of those products for a sample of consumers
I'd like to choose weights for each sample member such that the sum product of their weights and product consumption matches the national consumption, and such that the variance among the weights is minimized.
For example, my sample might include three consumers that used products at these rates:
- Consumer A: 1.0 pound of butter; 2.0 pounds of flour
- Consumer B: 3.0 pounds of butter; 3.0 pounds of flour
- Consumer C: 3.0 pounds of butter; 1.0 pounds of floor
And I might know that in my population as a whole there were 40 pounds of butter and 50 pounds of flour consumed.
This can be thought of as an indeterminate system of linear equations, so there are infinitely many sets of weights that could be chosen (e.g, A=10, B=10, C=0; or A=20, B=0, C=10), but I'm specifically interested in sets of weights with low variance.
If the variables were binary I might use RIM weighting or Iterative Proportional Fitting to calculate the weights, but it's not obvious how to extend those methods to data sets with continuous variables.
Is there a weight selection method that works well with continuous variables?
Best Answer
Update: substituted downloadable reference
This can be done by calibration (Sarndal (2007). Tom Lumley's Survey Package in R has a
calibrate
function and John D'Souza has contributed a calibration module to Stata (ssc install calibrate
). Here's your example analyzed by John'scalibrate
command. Setting the method tolinear
and the entry weightwt1
to 1 seemed to produce the most similar weights. If you have a sample weight variable then use that as the entry weight.Reference: Sarndal, C.E. 2007. The calibration approach in survey theory and practice. Survey Methodology 33, no. 2: 99-119, available at http://www.statcan.gc.ca/pub/12-001-x/2007002/article/10488-eng.pdf.