Statistics does not provide a good answer to this question, in my opinion. A mean can be relevant in mortality studies for example, but ages are not as easy to measure as you might think. Older people, illiterate people, and people in some third-world countries tend to round their ages to a multiple of 5 or 10, for instance.
The median is more resistant to such errors than the mean. Moreover, median ages are typically 20 – 40, but people can live to 100 and more (an increasing and noticeable proportion of the population of modern countries now lives beyond 100). People of such age have 1.5 to 4 times the influence on the mean than they do on the median compared to very young people. Thus, the median is a bit more up-to-date statistic concerning a country's age distribution and is a little more independent of mortality rates and life expectancy than the mean is.
Finally, the median gives us a slightly better picture of what the age distribution itself looks like: when you see a median of 35, for example, you know that half the population is older than 35 and you can infer some things about birth rates, ages of parents, and so on; but if the mean is 35, you can't say as much, because that 35 could be influenced by a large population bulge at age 70, for example, or perhaps a population gap in some age range due to an old war or epidemic.
Thus, for demographic, not statistical, reasons, a median appears more worthy of the role of an omnibus value for summarizing the ages of relatively large populations of people.
You can report a confidence interval for the median. In R, you can use wilcox.test
with the argument conf.int=TRUE
. There's a tiny discussion of this in John Verzani's simpleR notes: see here.
Regarding plots: I don't really like using bar plots even for representing a set of means. I'd prefer to just plot little line segments for the CI:
The plot on the right was made with errbar()
from the Hmisc package [CRAN page].
You could make the same sort of plot for the medians and the related confidence intervals, or you could use box plots (which, in the same amount of space, describe the entire distribution).
Best Answer
Can we chose the representation we want independently of the data distributions?
Yes, in principle, both methods are not restricted to any distribution. (For some distributions e.g. mean is not defined, but that is more the exception.)
How can we justify the use of either (besides one being prettier than the other)?
The representation with just mean and std is providing less information than the one with median and percentiles, presuming you provide sufficient percentiles. Think of a presentation that provides percentiles for all integer percent values. This gives you a very accurate description of the distribution, probably much more detailed than necessary. So the answer is: it depends on how detailed you want your data to be described.
Would it be better to use confidence intervals for either ?
Roughly speaking, the information provided by confidence intervals can also be gleaned from (appropriate) percentile presentations. Confidence intervals would provide a rather small amount of information, but maybe exactly what you need, so it really depends on what your requirements are.
Note that you cannot obtain confidence intervals from mean/std representations, but you also cannot obtain mean/std representations from a provided confidence interval.
In my experience, people without much statistical education can have a hard time correctly interpreting mean and std, while their interpretation of confidence interval and percentiles is more likely to be correct.