I have been given a dataset and been tasked with calculating the average wage for different job titles. The data includes a mix of ranges (min-max) and values.
If we assume for the job title, Developer, these are the wages:
1. $80K - 120K
2. $90K
3. $120K - 150K
4. $95K
5. $50K
6. $200K - 225K
7. $100K
8. $100 – 150K
9. $50K - 120K
10. $85K
How should I calculate the average wage for a Developer?
My thoughts are to first excluded wage information for 5, 6 and 9 as these look like outliers. I would then take the average of each range and add these together with the other wages, and then divide by the number of wages.
This would give:
(100 + 90 + 135 + 95 + 100 + 125 + 85)/7 = $104K
I would then conclude that the average wage for a Developer at Company is 104K, with the range 80K to 150K.
Is this approach correct?
EDIT
80K-120K means that there's one person whose income is somewhere in that range.
Best Answer
Your decision as to what constitutes an outlier is a very subjective one and has no place in analytics. There is no reason to suspect that salaries should follow a normal, or similarly shaped, distribution. It is completely understandable to have a junior developer just out of uni on a wage several standard deviations less than the average and experienced managerial level developers many times above. I have a feeling a real distribution would be at least bi-modal.
Please don't exclude outliers without a solid justification for doing so, but apart from that your approach is completely correct.