Solved – Weighted mean formula for two groups of unequal size

MATLABweighted mean

I am very confused with the weighted mean formula. Wikipedia lists the formula as $\bar{x} = \frac{\sum_{i=1}^{n} w_{i}x_{i}}{\sum_{i=1}^{n} w_{i}}$. The article also includes a basic numerical example, as follows:

Given two school classes, one with 20 students, and one with 30 students, the grades in each class on a test were:

Morning class = 62, 67, 71, 74, 76, 77, 78, 79, 79, 80, 80, 81, 81, 82, 83, 84, 86, 89, 93, 98

Afternoon class = 81, 82, 83, 84, 85, 86, 87, 87, 88, 88, 89, 89, 89, 90, 90, 90, 90, 91, 91, 91, 92, 92, 93, 93, 94, 95, 96, 97, 98, 99

The straight average for the morning class is 80 and the straight average of the afternoon class is 90. The straight average of 80 and 90 is 85, the mean of the two class means. However, this does not account for the difference in number of students in each class (20 versus 30); hence the value of 85 does not reflect the average student grade (independent of class). The average student grade can be obtained by averaging all the grades, without regard to classes (add all the grades up and divide by the total number of students):

$$\bar{x} = \frac{4300}{50} = 86$$

Or, this can be accomplished by weighting the class means by the number of students in each class (using a weighted mean of the class means):

$$\bar{x} = \frac{(20\times80) + (30\times90)}{20 + 30} = 86$$

However, when I tried calculating this for myself in MATLAB, I obtained the answer 86.9231 rather than 86. The weight I assigned to all data in "Morning class" is 0.4 and "Afternoon class" is 0.6. Could some one explain why? The code showing my confusion is below:

mc = [62, 67, 71, 74, 76, 77, 78, 79, 79, 80, 80, 81, 81, 82, 83, 84, 86, 89, 93, 98];
ac = [81, 82, 83, 84, 85, 86, 87, 87, 88, 88, 89, 89, 89, 90, 90, 90, 90, 91, 91, ...
      91, 92, 92, 93, 93, 94, 95, 96, 97, 98, 99];
avg_mc = mean(mc); % = 80
avg_ac = mean(ac); % = 90
len_mc = length(mc);
len_ac = length(ac);
weight_mc = len_mc/(len_mc+len_ac);
weight_ac = len_ac/(len_mc+len_ac);
weighted_mean = weight_mc * avg_mc + weight_ac * avg_ac; % = 86
mc2 = mc .* weight_mc;
ac2 = ac .* weight_ac;
weighted_mean2 = (sum(mc2) + sum(ac2)) / (weight_mc * len_mc + weight_ac * len_ac); 
  % = 86.9231

Best Answer

Given that this has now been taken off hold, I will re-enter my comment, which was in effect an answer, as an answer.

In the Wikipedia article example, each individual data point is not supposed to get different weights. In this case, all len_mc + len_ac number of individual data points are equally weighted. The 2 different weights (in this case, 0.4 and 0.6) only come into play if we want to consider the averages of mc and ac as being 1 data point each (i.e., one data point for mc, and one data point for ac); then we apply the Wikipedia weighted average formula to combine the averages of mc and ac. What you have done with weighted_mean2 is to more heavily weight each individual data point in ac than each individual data point in mc, and that is why you don't get the correct answer - or put another way, you have solved a different ("wrong") problem.