Solved – What’s the correct way to visualize discrete variables

data visualization

For example, when visualizing the GDP or GDP per capita of different countries, I often see a line plot (or a radar chart), where the x-axis (or the angular direction) is for countries and the y-axis (or the radial direction) is for the values of GDP or GDP per capita. I don't think this is perfect because the values between two countries make no sense. So what are better ways to visualize data like in this case?

Update: example data, source

Best Answer

There's not 'one correct way'; there are some good ways.

The obvious one to my mind would be a Cleveland dot-chart; it's for displaying numeric data on a factor.

enter image description here

Some people would use a bar chart for this purpose instead. If you have a useful classification (such as by region), you'd split by that classification.

With GDPs (whether raw or per capita), the variable covers several orders of magnitude, so it might make a great deal more sense to look on the log-scale (this also obviates any concerns some people might have with 0 not being on the scale above).

There are several uses in such a plot. 1. explicit comparison between countries (is A larger than B?). 2. extracting a data value (what is A's GDP?).

The Cleveland dotchart (or Cleveland dot plot) is based on research[1] into the kinds of comparisons that people are good at or less good at. We're very good at comparison of position along common scales, slightly less good with relative lengths and quite bad at relative areas or angles. In respect of 1. above this comparison is between the values represented by the points (which point is further to the right). In 2. this comparison is between the point and the parallel axis, both comparisons we're good at. The plot eliminates almost all ink that doesn't serve to directly aid these comparisons.

Quick, which is bigger, lemon or lime?
enter image description here

Very thin bars would make for a very similar sort of plot to a Cleveland dot-chart and can sometimes do well (particularly when both plots include 0), but dotcharts have an advantage when you want to plot several numbers for each country, since they can be represented by different symbols. This advantage is even larger if you're only able to use black and white. You also can't really use a log-scale on bar charts (where does the bottom of the bar start and what does the bar-length represent?) and so it's less suitable for data that spans several orders of magnitude.

[1]: Cleveland W.S. and McGill, R. (1984),
"Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods,"
Journal of the American Statistical Association, 79:387 (Sep.), 531-554.