Solved – How to describe a bar graph

barplothistogramterminology

What are some words for describing the overall shape of a bar graph for a nominal variable? The word "uniform" might apply when the bars are roughly the same height, but what can be said about other patterns? In other words, what are some analogous words we can use for bar graphs, similar to right skewed, left skewed, bell-shaped, etc. when describing histograms?

I might describe the following graph as "uniform."

enter image description here

But what can I call this next graph? Here we're in the situation where one bar is much larger than the others.

enter image description here

Are there any other distinct patterns that I'm missing? If so, what are they and what are some one to two word phrases that can be used to describe them?

Best Answer

A bar graph (bar plot, bar chart, whatever; in my experience bar chart remains the most common variant, but I can't see any grounds but taste or wanting to follow majority usage to prefer one to another) for a nominal variable might represent almost anything by bar length or height. So, for a nominal variable that is kind of pet (cat, dog, emu, etc.) the bar chart might show average amount of money spent by owners per year.

The implication in the question, or at least the inference of those commenting, is that you are talking about a bar chart display showing a distribution, say observed frequencies, or equivalently percents or probabilities. For example, a chart might show the numbers of cats, dogs, emus, etc. kept as pets in a country, or equivalently the proportions of each kind of pet.

Speaking broadly, any bar chart showing frequencies or probabilities might also be called a histogram, regardless of scale of measurement. Be warned that some people might prefer a more restricted usage.

I don't often see this misunderstood, but more can be said. The typology of nominal, ordinal, interval and ratio scales has often been oversold, and it misses many important nuances in principle and practice, but it serves well enough here to make some finer distinctions.

  • For nominal variables there is by definition no natural order to be used for the bars, but that just leaves the data analyst free in practice to order the categories in tables or in graphs by their frequencies. Talking about the skewness of such displays is at best a stretch and at worst a solecism. It would be better to talk informally about (say) evenness or uniformity or concentration. Those properties can be measured in many ways, without making any assumptions except that the categories are distinct. Often having a miscellaneous or "others" category is prudent. Measures of evenness or concentration include entropy, sum of squared probabilities, variance of probabilities, and so forth. Particular fields have many different terms that can dominate discussion (e.g. diversity in ecology).

  • For ordinal variables there is a by definition a natural order. It can be informative and helpful to talk about skewness, provided it is realised that e.g. grades on a 5 point (1, 2, 3, 4, 5) or 10 point scale are conventional rather than measurements in a strong sense. I note that despite purist objections to taking means (or skewness, etc.) of such grades, in practice such grades are often assigned knowing that they will be averaged: examples range from student grades in education through sports judging to surveys of consumer opinion. Such a display might informally be described as positively or negatively skewed in a standard way, although using formal measures of skewness might widely be seen as a stretch.

  • For interval or ratio scale variables there is no controversy that I can recall.

Occasionally students or other learners are reprimanded for calling a histogram a bar chart. Graphically, or geometrically, the learners are right: a histogram is a chart based on bars! I suggest, for once, being more diplomatic and tactfully underlining that histogram is the more precise and standard term within statistical sciences for bar charts showing distributions, certainly for interval or ratio scale variables.

A point of detail is that in a histogram the bars should touch if they correspond to intervals that touch. So, for example, human heights or weights are examples of continuous variables which must be binned before a histogram can be drawn. (In lousy software users have to fight defaults to ensure that bars do touch.) If a variable is discrete or categorical, different conventions can excite small passions about right or wrong ways to show bars. For nominal scales, touching bars are usually a bad idea, at least for any audience that includes technical people who may want to insist that the categories aren't subdivisions of a continuum. For grades or discrete variables (e.g. number of cars, children or computers in a household) whether histogram bars touch or are separate (or are thinned to mere spikes) seems more nearly a matter of taste or convention.

As stressed by @whuber in comments, sometimes a histogram shows frequencies of values in intervals of unequal length. Then the correct representation can only be to show probability density (or occasionally frequency density), adjusting for varying interval lengths. In that case, the principle can only be that areas of bars, not their lengths or heights, represent frequencies or probabilities. That distinction doesn't arise for nominal scale variables.

Related Question