Solved – Are heat maps “one of the least effective types of data visualization”

data visualizationheatmap

Question: When (for what types of data visualization problems) are heat maps most effective? (In particular, more effective than all other possible visualization techniques?)

When are heat maps least effective?

Are there any common patterns or rules of thumb one can use to decide whether or not a heat map is likely to be an effective way of visualizing the data, and when they are likely to be ineffective?

(Principally I have in mind heat maps for 2 categorical variables and 1 continuous variable, but am also interested in hearing about opinions regarding other types of heat maps.)

Context: I am taking an online course about data visualization, and right now they are discussing ineffective and over-used plot types. They already mentioned dynamite plots and pie charts, and the reasons given for why those are ineffective and why there are better alternatives to them were clear and convincing to me. Moreover, it was easy to find other sources corroborating the given opinions about dynamite plots and pie charts.

However, the course also said that "heat maps are one of the least effective types of data visualization". A paraphrasing of the reasons why are given below. But when I tried to find other places on Google corroborating this view point, I had a lot of difficulty, in contrast to looking up opinions about the effectiveness of pie charts and dynamite plots. So I would like to know to what extent the characterization of heat maps given in the course is valid, and when the factors against them are least important and most important for a given context.

The reasons given were:

It is difficult to map color onto a continuous scale.

There are some exceptions to this rule, so this is not usually a deal breaker, but in the case of heat maps, the problem is particularly difficult, because our perception of a color changes depending upon the neighboring colors. Thus heat maps are not well-suited for seeing individual results, even in small data sets. Which leads to:
Answering specific questions using a table look-up method is generally not feasible, since it is impossible to infer with sufficient accuracy the numerical value corresponding to a given color.
Often the data are not clustered in such a way to bring out trends.

Without such clustering it is often difficult or impossible to infer anything about general overall patterns.
Heat maps are often only used to communicate a "wow factor" or just to look cool, especially when using a multicolor gradient, but there are usually better ways to communicate the data.

Plotting continuous data on a common scale is always the best option. If there is a time component, the most obvious choice is a line plot.

Best Answer

There is no such thing as a "best" plot for this or for that. How you plot your data depends on the message you want to convey. Commonly used plots have the advantage that users are more likely to be able to read them. Nevertheless, that does not mean that they are necessarily the best choice.

Regarding heat maps, I've ordered my response by the supposed arguments against them.

1) If you don't trust color as an encoding channel, use brightness instead, with a scale encompassing dark gray to light gray "color" tones. Most often, you want to bin continuous variables (also see 5), so you can keep the number of colors low and make it easier to decode by users. This is not a must though. Take a look at this example, in which the continuous variable is not binned.

2) Certainly, they should not be used as an alternative to look up precise values. Heat maps should primarily be used to illustrate patterns, not to replace tables.

3,4) I don't see how this would be related to heat maps only.

5) Heat maps are ideally but not necessarily used with discrete variables. For continuous variables, heat maps can be used as a sort of two-dimensional histogram or bar chart, with proper binning, as well as brightness as an encoding channel.

Related Solutions

Solved – Problems with pie charts

I wouldn't say there's an increasing interest or debate about the use of pie charts. They are just found everywhere on the web and in so-called "predictive analytic" solutions.

I guess you know Tufte's work (he also discussed the use of multiple pie charts), but more funny is the fact that the second chapter of Wilkinson's Grammar of Graphics starts with "How to make a pie chart?". You're probably also aware that Cleveland's dotplot, or even a barchart, will convey much more precise information. The problem seems to really stem from the way our visual system is able to deal with spatial information. It is even quoted in the R software; from the on-line help for pie,

Cleveland (1985), page 264: “Data that can be shown by pie charts always can be shown by a dot chart. This means that judgements of position along a common scale can be made instead of the less accurate angle judgements.” This statement is based on the empirical investigations of Cleveland and McGill as well as investigations by perceptual psychologists.

Cleveland, W. S. (1985) The elements of graphing data. Wadsworth: Monterey, CA, USA.

There are variations of pie charts (e.g., donut-like charts) that all raise the same problems: We are not good at evaluating angle and area. Even the ones used in "corrgram", as described in Friendly, Corrgrams: Exploratory displays for correlation matrices, American Statistician (2002) 56:316, are hard to read, IMHO.

At some point, however, I wondered whether they might still be useful, for example (1) displaying two classes is fine but increasing the number of categories generally worsen the reading (especially with strong imbalance between %), (2) relative judgments are better than absolute ones, that is displaying two pie charts side by side should favor a better appreciation of the results than a simple estimate from, say a pie chart mixing all results (e.g. a two-way cross-classification table). Incidentally, I asked a similar question to Hadley Wickham who kindly pointed me to the following articles:

Spence, I. (2005). No Humble Pie: The Origins and Usage of a Statistical Chart. Journal of Educational and Behavioral Statistics, 30(4), 353–368.
Heer, J. and Bostock, M. (2010). Crowdsourcing Graphical Perception: Using Mechanical Turk to Assess Visualization Design. CHI 2010, April 10–15, 2010, Atlanta, Georgia, USA.

In sum, I think they are just good for grossly depicting the distribution of 2 to 3 classes (I use them, from time to time, to show the distribution of males and females in a sample on top of an histogram of ages), but they must be accompanied by relative frequencies or counts for being really informative. A table would still do a better job since you can add margins, and go beyond 2-way classifications.

Finally, there are alternative displays that are built upon the idea of pie chart. I can think of square pie or waffle chart, described by Robert Kosara in Understanding Pie Charts.

Solved – Pie charts vs. dot plots

There are two different types of chart that that are referred to as 'dotplots' and I think that you are getting the two confused. The type of dotplot that it looks like you are thinking about is really a variation on a histogram and does not convey the same type of information that a pie chart would.

The type of dotplot from Cleveland is essentially a bar chart with a dot placed at the end of each bar, then the bar is removed. So even with millions of data points, they would be tabled the same as for creating a pie chart, then a single dot is plotted for each category. The summary preparing for the plot is the same in a pie chart and a dotplot: the difference is in a pie chart you are trying to compare non-aligned angles or areas (and the temptation to add chartjunk or otherwise distort the perception of the values is much higher) and in the dotplot you are comparing points on an aligned scale.

If you want the viewer to be able to easily judge percentage of the whole then just make sure that the axis for the dot positions goes from 0 to the total count. You can also easily add another axis (or replace the main one) that shows the percentage rather than the counts, then the percentage can be read off that axis much more accurately than estimating angles and areas in pie charts.

Here are a couple of examples using R:

This is the type of dotplot that I think you are thinking of, and this would not replace a pie chart:

library(TeachingDemos)
dots(round( rnorm(100),0 ) )

enter image description here

But this is the type of dotplot being referred to in Cleveland as a replacement for pie charts:

# steal data from ?pie
pie.sales <- c(0.12, 0.3, 0.26, 0.16, 0.04, 0.12)
names(pie.sales) <- c("Blueberry", "Cherry",
    "Apple", "Boston Cream", "Other", "Vanilla Cream")
par(mfrow=c(2,1))
dotchart(pie.sales*100)
# or
par(xaxs='i')
dotchart( pie.sales*100, xlim=c(0,100) )

enter image description here

Best Answer

Related Solutions

Solved – Problems with pie charts

Solved – Pie charts vs. dot plots

Related Question