I have a question that may be basic but I do not seem to find the solution for this and so would appreciate any advice please. If we have the following (example taken from https://www.robertoreif.com/blog/2018/1/7/why-you-should-be-careful-when-averaging-percentages):
DO NOT LIKE CHOCOLATE LIKE CHOCOLATE TOTALS
Children 10,000 90,000 100,000
Adults 160,000 240,000 400,000
Totals 170,000 330,000 500,000
and we want to calculate the percentage of people who like chocolate across the entire population, we do:
330,000/500,000=66% and NOT (90,000/100,000=90%) + (240,000/400,000=60%) / 2 which would give us 75%.
However, I am struggling to apply the same thinking to the weighted average of percentages. In my case, I have something like:
ORGANISATION TOTAL_INDIVIDUALS* REGION REGION_OF_TOTAL_INDIVIDUALS NUMBER_PREMISES_IN_EACH_REGION_WITH_DECENT_INTERNET TOTAL_PREMISES_IN_EACH REGION
A 15,000 1 10,000 50 100
A 15,000 2 3,000 40 212
A 15,000 3 2,000 90 109
B 5,000 4 4,500 66 90
B 5,000 5 500 33 123
*TOTAL_INDIVIDUALS that are part of the organisation
I cannot link the individual to their premise, I only have access to a proxy – the number of premises in the region where the individual lives that have decent internet. I would like to calculate the percentage of individuals/premises with decent internet for each organisation taking into account the number of individuals per region (that's my weight). Any advice please?
More details below please:
I cannot link each individual to their premise. What I have is the quality of the internet connection per region and I know in which region each individual lives so I am assuming the individuals from a specific region have the quality of the internet connection of the region. I would like an overall metric of quality of internet connection for each organisation. E.g. organisation B has 5000 individuals, 4500 come from region 4 with good quality of internet connection (66/90=73%) and 500 from a region 5 with lower quality of internet connection (33/123=27%), but I cannot just do 66+33/90+123 because the number of individuals coming from each region for that organisation is not 2500 & 2500, but 4500 & 500, as in, there are more people coming from region 4 than 5, so I need some sort of weighted average here.
Proposed solution:
ORGANISATION TOTAL_INDIVIDUALS REGION REGION_OF_TOTAL_INDIVIDUALS NUMBER_PEOPLE_WITH_DECENT_INTERNET PERCENT_PEOPLE_WITH_DECENT_INTERNET_PER_ORGANISATION
A 15,000 1 10,000 (50/100)*10,000=5,000 (5,000+566+1,651)/15,000=48%
A 15,000 2 3,000 (40/212)*3,000=566 (5,000+566+1,651)/15,000=48%
A 15,000 3 2,000 (90/109)*2,000=1,651 (5,000+566+1,651)/15,000=48%
B 5,000 4 4,500 (66/90)*4,500=3,300 (3,300+134)/5,000=67%
B 5,000 5 500 (33/123)*500=134 (3,300+134)/5,000=67%
Best Answer
A region does not have a quality of internet connection according to your figures. Some premises in each region have good quality, and some have poor quality.
I suppose what you are trying to say is that the ratio of premises with good connectivity to total premises is a useful predictor of the ratio of people with good connectivity to total people in the same region. For example, in Region $1$ we find that $50\%$ of premises have good connectivity, therefore we estimate that $50\%$ of people in Region $1$ have good connectivity.
This is only a way of estimating (that is, guessing) how many people have good connectivity. If $9000$ people in Region $1$ are all in one premises and the others are distributed over the other $99$ premises, clearly you either have more than $90\%$ or less than $10\%$ of people having good connectivity in Region $1,$ depending on the connectivity at the one premises where the $9000$ people are. Since you have no data about how many people are at the various individual premises, you have no way to know if such an effect is occurring or not.
But if you accept the method of estimating number of people with good connectivity, you can simply make those calculations for each region and add a column to your table containing those figures. (That is, add a column for "number of persons with good connectivity".) And then you will be able to add up numbers of persons with good connectivity just like you added up numbers of persons who like chocolate.