Solved – Sample weighting with continuous variables

samplingweighted-sampling

Given two sets of data:

  • Total usage of a few hundred products across the population as a whole
  • Per-consumer usage of those products for a sample of consumers

I'd like to choose weights for each sample member such that the sum product of their weights and product consumption matches the national consumption, and such that the variance among the weights is minimized.

For example, my sample might include three consumers that used products at these rates:

  • Consumer A: 1.0 pound of butter; 2.0 pounds of flour
  • Consumer B: 3.0 pounds of butter; 3.0 pounds of flour
  • Consumer C: 3.0 pounds of butter; 1.0 pounds of floor

And I might know that in my population as a whole there were 40 pounds of butter and 50 pounds of flour consumed.

This can be thought of as an indeterminate system of linear equations, so there are infinitely many sets of weights that could be chosen (e.g, A=10, B=10, C=0; or A=20, B=0, C=10), but I'm specifically interested in sets of weights with low variance.

If the variables were binary I might use RIM weighting or Iterative Proportional Fitting to calculate the weights, but it's not obvious how to extend those methods to data sets with continuous variables.

Is there a weight selection method that works well with continuous variables?

Best Answer

Update: substituted downloadable reference

This can be done by calibration (Sarndal (2007). Tom Lumley's Survey Package in R has a calibrate function and John D'Souza has contributed a calibration module to Stata (ssc install calibrate). Here's your example analyzed by John's calibrate command. Setting the method to linear and the entry weight wt1 to 1 seemed to produce the most similar weights. If you have a sample weight variable then use that as the entry weight.

Reference: Sarndal, C.E. 2007. The calibration approach in survey theory and practice. Survey Methodology 33, no. 2: 99-119, available at http://www.statcan.gc.ca/pub/12-001-x/2007002/article/10488-eng.pdf.

clear
input id butter flour
1 1 2
2 3 3
3 3 1
end

/* Butter */
matrix M1 = [40]
gen wt1 = 1 /* arbitrary */
calibrate , ///
marginals(butter) poptot(M1) method(linear)  ///
entrywt(wt1) exitwt(wt_but)

/* Flour */
matrix M2 = [50]
calibrate , ///
marginals(flour) poptot(M2) method(linear) ///
entrywt(wt1) exitwt(wt_flou)

/* The weights */
list id butter wt_but flour wt_flou

    +-------------------------------------------+
    | id   butter     wt_but   flour    wt_flou |
    |-------------------------------------------|
 1. |  1        1   2.736842       2   7.285714 |
 2. |  2        3   6.210526       3   10.42857 |
 3. |  3        3   6.210526       1   4.142857 |
 +-------------------------------------------+

/* Calculate weighted totals to check */

gen  xbut = wt_but*butter
gen  xflou = wt_flou*flour
total xbut xflou

   Total estimation                    Number of obs    =       3

--------------------------------------------------------------
             |      Total   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
        xbut |         40   15.89474     -28.38954    108.3895
       xflou |         50   23.71558     -52.03989    152.0399
--------------------------------------------------------------