Solved – central tendency

central limit theoremcentral-tendencydefinition

I have seen many questions on CV that deal with central tendency. It seems to be a nebulous topic. Definitions include values of a distribution that are most common. For probability distributions and samples the quantities mean, mode and median are considered measures of central tendency. We see this term used in elementary statistics books. We also see examples where these three measures do not seem to jive with our concept of central tendency. For example consider the discrete distribution that that puts probability 1/2 at 0 and 1/2 at 1. The mean for this distribution is 1/2. But all the probability mass is concentrated away from 1/2. Is the median well defined? Is it 0 since half the probability mass is equal to 0 or is it 1 because exactly half of the distribution is less than 1. If we take a random sample from this distribution the sample median could be 0 1/2 or 1 depending on the sample size. This could also be called a bimodal distribution since 0 and 1 are both points with the largest probability mass each have height 1/2.

I am taking the position that there are distributions like this one that have no center or central tendency. There are continuous analogues like this also. I will try to make this question specific because it could be too general just to solicit opinions.

  1. Do you think all distributions have some measure that could be called the center? If not give a favorite example. There is a well known joke. "A man has one foot in an ice bucket and the other on burning coals. But on average the temperature of his feet is fine."

  2. For population distributions whose sample means satisfy the conditions of the central limit theorem the sample mean converges to the population mean and exhibits a central tendency. Assuming that is a fair assessment what about distributions like the Cauchy whose sample mean is unstable and the population mean doesn't exist? Do we say it does not have a center or could we say that the median is a measure of central tendency?

I realize that this post could be viewed as philosophical or a question of definition. I think it is appropriate here because so many questions have popped up on this where people are mystified by examples. Examples of questions asked are (a) What is the central tendency of response time. (b) What is a measure of central tendency for a periodic variable (c) How do you compute the central tendency for this particular distribution. (d) How do you compare central tendency of two groups with standardized rank scales. These are just a few. While writing this question I am given a list of 10 similar questions.

I think that a good discussion with this question could suggest why some of these questions have no good answer.

Best Answer

You may find it more useful to think of "central tendency" as giving a sense of the distribution's location. This is in contrast to measures of spread (variance, range, etc.), which don't communicate location. From the wikipedia entry for central tendency:

In statistics, a central tendency (or, more commonly, a measure of central tendency) is a central or typical value for a probability distribution. It may also be called a center or location of the distribution.

For many examples, the mean, median or mode will communicate the location of the distribution well. If you want to know how much it costs to buy a houseboat in Amsterdam, for example, knowing the average price, the 50th percentile price, or the most common price will all give you a sense of what a houseboat there costs. Of course, there will be plenty of variability in that distribution, and knowing the mean (or median or mode) won't actually tell you how much any individual houseboat would cost. But it does give you an idea of the location of the distribution of prices of houseboats on the scale of \$0 to \$infinity (e.g. you'll have the sense that it costs more than a cup of coffee and less than buying a restaurant in Tokyo). Even for many discrete variables, the mean can be interesting and useful (for example, you may be curious about the average number of rooms per houseboat even though it's actually nonsensical to talk about a fraction of a room).

Because the mean is so useful and so widely applicable, I think many people conflate "central tendency" with "average value", but there's no real reason to do that. As you point out, there are plenty of situations where the mean (or median) is an unusual value (multimodal or nonsymmetric distributions) or even an impossible value (in the case of discrete distributions). Happily, there are plenty of ways to communicate location / central tendency. Just pick one that makes sense for your data.

Related Question