How to calculate weighted average of percentages

averagepercentages

I have a question that may be basic but I do not seem to find the solution for this and so would appreciate any advice please. If we have the following (example taken from https://www.robertoreif.com/blog/2018/1/7/why-you-should-be-careful-when-averaging-percentages):

                 DO NOT LIKE CHOCOLATE      LIKE CHOCOLATE          TOTALS
Children         10,000                     90,000                  100,000  
Adults           160,000                    240,000                 400,000
Totals           170,000                    330,000                 500,000

and we want to calculate the percentage of people who like chocolate across the entire population, we do:

330,000/500,000=66% and NOT (90,000/100,000=90%) + (240,000/400,000=60%) / 2 which would give us 75%.

However, I am struggling to apply the same thinking to the weighted average of percentages. In my case, I have something like:

ORGANISATION   TOTAL_INDIVIDUALS*   REGION    REGION_OF_TOTAL_INDIVIDUALS      NUMBER_PREMISES_IN_EACH_REGION_WITH_DECENT_INTERNET  TOTAL_PREMISES_IN_EACH REGION
A              15,000               1         10,000                           50                                                   100  
A              15,000               2         3,000                            40                                                   212
A              15,000               3         2,000                            90                                                   109
B              5,000                4         4,500                            66                                                   90
B              5,000                5         500                              33                                                   123

*TOTAL_INDIVIDUALS that are part of the organisation

I cannot link the individual to their premise, I only have access to a proxy – the number of premises in the region where the individual lives that have decent internet. I would like to calculate the percentage of individuals/premises with decent internet for each organisation taking into account the number of individuals per region (that's my weight). Any advice please?

More details below please:
I cannot link each individual to their premise. What I have is the quality of the internet connection per region and I know in which region each individual lives so I am assuming the individuals from a specific region have the quality of the internet connection of the region. I would like an overall metric of quality of internet connection for each organisation. E.g. organisation B has 5000 individuals, 4500 come from region 4 with good quality of internet connection (66/90=73%) and 500 from a region 5 with lower quality of internet connection (33/123=27%), but I cannot just do 66+33/90+123 because the number of individuals coming from each region for that organisation is not 2500 & 2500, but 4500 & 500, as in, there are more people coming from region 4 than 5, so I need some sort of weighted average here.

Proposed solution:

ORGANISATION   TOTAL_INDIVIDUALS   REGION    REGION_OF_TOTAL_INDIVIDUALS      NUMBER_PEOPLE_WITH_DECENT_INTERNET   PERCENT_PEOPLE_WITH_DECENT_INTERNET_PER_ORGANISATION
A              15,000              1         10,000                           (50/100)*10,000=5,000                (5,000+566+1,651)/15,000=48%
A              15,000              2         3,000                            (40/212)*3,000=566                   (5,000+566+1,651)/15,000=48%
A              15,000              3         2,000                            (90/109)*2,000=1,651                 (5,000+566+1,651)/15,000=48%
B              5,000               4         4,500                            (66/90)*4,500=3,300                  (3,300+134)/5,000=67%                                     
B              5,000               5         500                              (33/123)*500=134                     (3,300+134)/5,000=67%

Best Answer

I am assuming the individuals from a specific region have the quality of the internet connection of the region.

A region does not have a quality of internet connection according to your figures. Some premises in each region have good quality, and some have poor quality.

I suppose what you are trying to say is that the ratio of premises with good connectivity to total premises is a useful predictor of the ratio of people with good connectivity to total people in the same region. For example, in Region $1$ we find that $50\%$ of premises have good connectivity, therefore we estimate that $50\%$ of people in Region $1$ have good connectivity.

This is only a way of estimating (that is, guessing) how many people have good connectivity. If $9000$ people in Region $1$ are all in one premises and the others are distributed over the other $99$ premises, clearly you either have more than $90\%$ or less than $10\%$ of people having good connectivity in Region $1,$ depending on the connectivity at the one premises where the $9000$ people are. Since you have no data about how many people are at the various individual premises, you have no way to know if such an effect is occurring or not.

But if you accept the method of estimating number of people with good connectivity, you can simply make those calculations for each region and add a column to your table containing those figures. (That is, add a column for "number of persons with good connectivity".) And then you will be able to add up numbers of persons with good connectivity just like you added up numbers of persons who like chocolate.

Related Question