Solved – How does mean and median play an important role in data mining

meanmedian

Can anyone explain me in simple english why does mean and median play a important role on Data mining. Actually finding a mean and median is of what use?
And many people say's median is better at times than mean, why so? And I'm a novice in data mining so it will be really good if the answer is in simple terms.

Best Answer

Often you want to reduce multiple measurements of something to one value, because thats easier to handle and understand than the complete distribution, and you are o.k. with the information loss. So you take some "average" that should be representative of the distribution of the values. This "average" should be in the "middle" of the distribution. There are many ways to calculate the "middle", and Wikipedia lists some of them.

The median is exactly that: 50% of the values are smaller, 50% are larger, so the values of the actual measurements don't matter (only their rank) which makes it robust against skews in the distribution.

The arithmetic mean is the sum of all the values divided by their number. The numbers matter, it shifts with the distribution.

Which one is "better" depends on your application. Most people prefer a "robust" estimator, because they don't know the underlying distribution and want to be on the safe side.