I have results from 5 surveys each 2 years apart and let us assume that no subjects are selected in more than one survey.
The sampling method used in these surveys are biased and I have sampling weights calculated(with respect to the population) for each data point in each study.
The question is, how would I be able to combine the 5 datasets and have the weights recalculated so as to obtain one giant dataset for analysis on this population?
Also, what should I do if subjects appear in more than one survey?
Updates/Further Elaboration:
thank you @user30523, here are some more infomation that might be useful:
Suppose I wish to find out the estimated distribution of height across the population using these 5 datasets.
In some data, younger people are oversampled because of the location where the survey are conducted. Let's assume the weights are calculated with respect to their age.
Eg. assuming 2% of the population are 15 years old, and the location of the survey is at a mall where 15-year-olds made up 5% of all shoppers, then sampling weight for an subject aged 15 in that survey would be calculated as 0.02 / 0.05 = 0.4. For simplicity, each person in the mall has equal chance of being surveyed and all participants complied when asked.
Given that 5 surveys are conducted in 5 different malls and each has their set of weights calculated in the same way, how would I then be able to combine all 5 datasets and recalculate the sampling weights?
P.S: I'm new to the topic on sampling weights so do correct me if I have made errors in the way I have calculated the weights.
Best Answer
I think if each dataset is already weighted to your satisfaction, then you have a couple of different options. Which one is the right one may vary based on your objectives and the particulars of your existing data collection and weighting.
Example for #2: each dataset is weighted to equal importance, with this "dataset weight" being multiplied by whatever weight has already been calculated within the dataset.