Why is average defined as sum of the data divided by the total number of data

averageprobabilitystatistics

Wherever I search on the internet, the definition of average is more like just giving a formula for it-

sum the values and then divide the result by the number of values, you
get a number, that number is the average.

Okay, but what does it represent? A central value? Okay but how did mathematicians know that this central value was sum of the values divided by the number of values?

In a nutshell, I'm curious to know what led to the formula of average as sum of the data / number of data. I believe the answer lies in knowing the significance of the concept of average.

Best Answer

There are various ways of computing an "average", all of which represent different concepts. The average you describe is called the arithmetic mean (or often, just the mean), and it represents an "equal apportionment" of the total value across all the items.

Suppose you have 10 jugs of water, each filled with a different volume of water. The mean volume is the amount of water that would be in each jug if you were to distribute the water equally among the 10 jugs, which is simply the total amount of water divided by 10.

You may or may not find the mean to be a useful or meaningful measure of "average" depending on what you want to describe and the underlying distribution of the data. The mean is quite useful as a summary statistic for data with a central tendency, like if the amount of water in the jugs is roughly normally distributed - in this case, the mean would usually represent a value "near" what you'd expect to find by picking a jug at random. It may be less useful as a summary statistic if the data is distributed without a central tendency, like if half the jugs are empty and half are filled 10L of water - here, the mean value is 5L and still represents an "equal apportionment" of water across all the jugs, yet none of the jugs actually have anywhere close to 5L in them.

Another example where the mean may not be very useful is in the case of extreme outliers, like if 9 jugs are empty and one has 1000L of water - the mean value in this case is 100L, yet most of the jugs don't hold anything at all. Here, the median may be the more useful measure of "average", but it really depends on what you're intending to convey with the statistic. If you want to convey a "typical" amount of water in this case, the mean isn't the way to go, but if you're trying to repackage the water into equally-sized containers, it's exactly the right measure. Income is one such measure where you more typically see the median salary/worth being reported rather than the mean, since the mean value is sensitive to billionaire outliers and doesn't reflect what is "typical" as well as the median does in this particular case.

Related Question