Solved – How to weight means when combining different sample sizes

meanpopulationsample

I know the means of a few samples from a population. I also know what the sample size is for each sample. What is the best way to combine the sample means to get the best estimate of the population mean? I'm wondering if the square root of the sample size might be the weight.

Edit after receiving two answers and a comment. With hindsight I should have given the specific problem. I want to estimate the total number of tickets sold in a lottery, given that I know the probability of winning a prize, and the number of tickets that won that prize. Prize A had a 1/314 chance of being won, and 147110 people won it. Prize B had a 1/49 chance of being won, and 952354 won it. Prize C had a 1/22 chance of being won, and 2107578 people won it. These give estimates of the total number of tickets sold of 46192540, 46665346, and 46366716 respectively. Prize C has the largest number of winners so should give the best estimate, but prizes A and B could contribute some information as well. How should I combine the above for the best overall estimate?

Edit again: the implied answer was that I should group all the samples into one big sample. But this does not take into account that some samples are more subject to noise than others. For example and taking things to extremes, if I took a sample of the number of people winning the jackpot, then this would estimate the number of entries at zero, since nobody won the jackpot that day. But if someone had won the jackpot, then since the chances of winning it are (as far as I recall) about 135 million to one, it would give an estimate of 135 million entries. Either of these two outcomes would distort the overall estimate greatly. So how can I create an overall estimate even if some of the samples are less reliable than others?

Best Answer

I'm wondering if the square root of the sample size might be the weight.

Indeed not. Consider that you have a sample of size $N$, and all observations are drawn independently from the same population. Then you'd want to weight all observations equally.

Now consider that the $N$ observations are split into two groups $n_1$ and $n_2$ that you and a friend find the means of the first and second group respectively. Now you want to combine your means.

Clearly the best outcome is still to end up with an equal weight on all $N$ observations in the combined mean. As a result it should be immediately clear how to weight the individual samples; there's only one weighting scheme that works.

Related Question