Is it okay in this case to average averages

statistics

I have 5 years of data giving the salaries for each department in an organisation shaped roughly like this (with the number of employees and salary changing each year):

Department Employees Salary
A 118 25,834
B 375 22,356
C 235 26,519

The question I'm trying to answer is: what was the average salary over the five years for the whole organisation? At first I thought I could do a weighted average for each year using the employee column as the weights (as this is the total number of employees in each department) and then average the averages; however, I have read that it is usually invalid to average averages.

If I was to do the weighted average for each year and then do another weighted average using the total number of employees across all departments for each year as the weight, would this make it valid or is there no case in which it would be valid to do this?

Best Answer

You need to decide what “the average salary over the five years for the whole organisation” means. Here's an example to show how it can be ambiguous. In this example, there are no departments, but after the example, I will mention what to do if you only have aggregated data by department, which is the case for you.

Suppose the organization had one employee for the first four years, and that employee earned \$15,000 every year. In the fifth year, there were ten employees: the one who had been there before, still earning \$15,000, and nine new employees, each earning \$100,000.

What is the answer you want? Is it the average of all one-year salaries ever paid at the company, which would be $$A_s={15000+15000+15000+15000+15000+9\cdot100000\over14}$$

or would it be the average of the yearly average salaries, which would be

$$A_y={15000+15000+15000+15000+{15000+9\cdot100000\over10}\over5}\mbox{ ?}$$

Now let's see how to get each of these if we have departments.

Let the \$15,000 person be the sole person in department A, and let the \$100,000 employees be in department B.

So your input data is four years of this:

Department Employees Salary
A 1 15,000

and one year of this:

Department Employees Salary
A 1 15,000
B 9 100,000

It's easy to see that $A_y$ is the average of the five weighted average salaries (so you weight for each yearly average, but then you do not weight a second time when you average over years).

On the other hand, $A_s$ is the weighted-weighted average, where you weight by employee count twice: first when you calculate each yearly average salary, and then again when you average the five results.

$${1\cdot15000+1\cdot15000+1\cdot15000+1\cdot15000+10\cdot{15000+9\cdot100000\over10}\over14}$$

In other words, there is a case where weighting by number of employees twice is appropriate.

tl;dr: Always work out a simple but non-trivial example!

Related Question