Solved – How to properly plot paired data when you have more than two groups

data visualizationpaired-comparisons

I have a question which seems to be very elemental but I have found lot of disagreement about it. I have a situation where we took several measurements of the same individuals. I was suggested to make a plot like this:

enter image description here

Nevertheless, I have found some critics to this approach, since some reviewers mention it is not possible to "easily" see some values just like the mean/median and the variation of the data. In this paper recently published (http://journals.plos.org/plosbiology/article?id=10.1371%2Fjournal.pbio.1002128#pbio.1002128.s007), it is suggested that both, the scatter and lines plot could be used, but I would like to hear other opinions. If somebody knows what is the "best" or most accepted way to plot this data and/or knows an alternative way, I would be really thanked!

Best Answer

I agree with the reviewer that it's very hard to read.

I personally love using heatmaps (aka pseudocolor plots aka checkerboard plots) for this kind of thing:

heatmap

And small multiples can be very nice as well:

small multiples

Andrew Gelman has blogged about these kinds of displays before, too. There's a time and place for plots like yours. He calls them "spaghetti plots", and as Nick Cox mentioned in the comments they actually work better when the series starts in one place and fans out, or when the lines don't overlap much, more like raw dried spaghetti than cooked. I tend to like the heat map (which a commenter on that post calls a "lasagna plot") better, because it scales almost arbitrarily. Prof Gelman is also the one who turned me onto small multiples.

Note however that heatmaps tend to work better when they aren't constrained to greyscale. For instance the one I made would greatly benefit from a red/blue diverging color scheme with white at zero

We make graphs to facilitate comparisons. Whenever you make a plot, you should ask yourself which comparisons it facilitates, and which comparisons it obfuscates.

The R code for these:

x <- replicate(8, arima.sim(list(ar = 0.1, ma = 2.5), 4))

## the ugly way ----
library(compactr)  # a very nice convenience package for ploting
eplot(xlim = c(1, 4), ylim = c(-9, 10),
      xat = 1:4, xticklab = paste("Prey", 1:4),
      ylab = "Time", main = "Ugly")
invisible(apply(x, 2, function(xj) {
  points(xj, pch = 16)
  lines(xj)
}))

## checkerboard ----

checkercols <- colorRamp(c("black", "white"))((1:20)/20) / 255
# checkercols <- colorRamp(c("red", "white", "blue"))((1:20)/20) / 255  # for more useful colors
checkercols <- apply(checkercols, 1, function (x) rgb(x[1], x[2], x[3]))

op <- par()
layout(matrix(1:2), heights = c(3, 1))
par(mar = c(1, 1, 2, 1), oma = c(2.5, 2.5, 3, 1))
image(x, col = checkercols, xaxt = "n", yaxt = "n")
axis(1, at = (0:3) / 3, labels = paste("Prey", 1:4))
axis(2, at = (0:7) / 7, labels = 1:8)
mtext("Individual", 2, line = 2)
title(main = "I like these", outer = TRUE)

colorbar <- matrix(1:20, 20)
image(colorbar, col = checkercols, xaxt = "n", yaxt = "n")
axis(1, at = (0:19) / 19, labels = round(quantile(x, seq(1/20,1,1/20)), 2))
par(op)

## small multiples ----

op <- par
par(mfrow = c(2, 4),
    mar = rep(0.75, 4),
    oma = c(2.5, 2.5, 3, 1))
invisible(apply(x, 2, function (xj) {
  eplot(xlim = c(1, 4), ylim = c(-9, 10), xat = 1:4)
  points(xj, pch = 16)
  lines(xj)
}))
title(main = "These can be good, too", outer = TRUE, cex.main = 1.5)
mtext("Time", 2, line = 1, outer = TRUE)
mtext("Prey", 1, line = 1, outer = TRUE)
par(op)

Best Answer

Related Solutions

Solved – How to plot clusters in more than 3 dimensions

Solved – Is a paired t-test correct for comparing two groups when there is a confounding variable

Related Question