Solved – How to visualize percentages compared along with number of entries.

data visualizationpie chart

I'm trying to figure out the best way to visualize the below chart and emphasize treatment efficacy as apposed to number of patients who tried the treatment.
Here's the link to the actual page: http://curetogether.com/cluster-headaches/treatments/

Cluster Headache Treatment Legend
Cluster Headache Treatments

What's the best way to emphasize the effectiveness while still making it easy to compare treatments and see how many patients rated each? My thought was to show the effectiveness as percentages, but I'm not sure how to still make them easily comparable and show the number of patients who tried each.

Thanks!

Best Answer

You wish to compare "effectiveness" and evaluate the numbers of patients reporting each treatment. Effectiveness is recorded in five discrete, ordered categories, but (somehow) is also summarized into an "Avg." (average) value, suggesting it is thought of as a quantitative variable.

Accordingly, we should choose a graphic whose elements are well adapted to convey this kind of information. Among the many excellent solutions suggest themselves, one uses this schema:

  • Represent total or average effectiveness as a position along a linear scale. Such positions are most readily grasped visually and accurately read quantitatively. Make the scale common to all 34 treatments.

  • Represent numbers of patients by some graphical symbol that is easily seen to be directly proportional to those numbers. Rectangles are well suited: they can be positioned to satisfy the preceding requirement and sized in the orthogonal direction so that both their heights and their areas convey the patient-number information.

  • Distinguish the five effectiveness categories by a color and/or shading value. Maintain the ordering of these categories.

One enormous error made by the graphic in the question is that the most prominent visual values--the lengths of the bars--depict the patient-number information rather than the total effectiveness information. We can fix that easily by recentering each bar about a natural middle value.

Without making any other changes (such as improving the color scheme, which is exceptionally poor for any color-blind person), here is the redesign.

Figure

I added horizontal dotted lines to help the eye connect labels with plots, and erased a thin vertical line to show the common central location.

The patterns and numbers of responses are much more evident. In particular, we essentially get two graphics for the price of one: on the left hand side we can read off a measure of adverse effects while on the right hand side we can see how strong the positive effects are. Being able to balance the risk, on the one hand, against the benefit, on the other, is important in this application.

One serendipitous effect of this redesign is that the names of treatments with many responses are vertically separated from the others, making it easy to scan down and see which treatments are the most popular.

Another interesting aspect is that this graphic calls into question the algorithm used to order the treatments by "Avg. effectiveness": why, for instance, is "Headache tracking" placed so low when, among all the most popular treatments, it was the only to have no adverse effects?

The quick-and-dirty R code that produced this plot is appended.

x <- c(0,0,3,5,5,
       0,0,0,0,2,
       0,0,3,2,4,
       0,1,7,9,7,
       0,0,3,2,3,
       0,0,0,0,1,
       0,1,1,1,2,
       0,0,2,2,1,
       0,0,1,0,1,
       0,0,3,2,1,
       0,0,2,0,1,
       1,0,5,5,2,
       1,3,15,15,4,
       1,2,5,7,3,
       0,0,4,4,0,
       0,0,2,2,0,
       0,0,3,0,1,
       0,0,2,2,0,
       0,4,18,19,2,
       0,0,2,1,0,
       3,1,27,25,3,
       1,0,2,2,1,
       0,0,4,2,0,
       0,1,6,5,0,
       0,0,3,1,0,
       3,0,3,7,2,
       0,1,0,1,0,
       0,0,21,4,2,
       0,0,6,1,0,
       1,0,2,0,1,
       2,4,15,8,1,
       1,1,3,1,0,
       0,0,1,0,0,
       0,0,1,0,0)
levels <- c("Made it much worse", "Made it slightly worse", "No effect or uncertain",
            "Moderate improvement", "Major improvement")
treatments <-  c("Oxygen", "Gluten-free diet", "Zomig", "Sumatriptan", "Rizatriptan (Maxalt)",
                 "Dilaudid suppository", "Dilaudid-Morphine", "Verapamil",
                 "Magic mushrooms", "Magnesium", "Psilocybine", "Excedrin Migraine",
                 "Ice packs on neck and head", "Passage of time", "Red Bull", "Lidocaine",
                 "Vitamin B-2 (Roboflavin)", "Caffergot", "Caffeine", "Tobasco in nose / on tongue")
treatments <- c(treatments, 
                 "Ibuprofen", "Topamax", "Excedrin Tension Headache", "Acetaminophen (Tylenol)",
                 "Extra Strength Excedrin", "Hot water bottle", "Eletriptan", 
                 "Headache tracking", "Women to Women vitamins", "Effexor", "Aspirin",
                 "Propanolol", "L-Arginine", "Fioricet")
x <- t(matrix(x, 5, dimnames=list(levels, treatments)))
#
# Precomputation for plotting.
#
n <- dim(x)[1]
m <- dim(x)[2]
d <- as.data.frame(x)
d$Total <- rowSums(d)
d$Effectiveness <- (x %*% c(-2,-1,0,1,2)) / d$Total
d$Root <- (d$Total)
#
# Set up the plot area.
#
colors <- c("#704030", "#d07030", "#d0d0d0", "#60c060", "#387038")
x.left <- 0; x.right <- 6; dx <- x.right - x.left; x.0 <- x.left-4
y.bottom <- 0; y.top <- 10; dy <- y.top - y.bottom
gap <- 0.4
par(mfrow=c(1,1))
plot(c(x.left-1, x.right), c(y.bottom, y.top), type="n", 
     bty="n", xaxt="n", yaxt="n", xlab="", ylab="", asp=(y.top-y.bottom)/(dx+1))
#
# Make the plots.
#
u <- t(apply(x, 1, function(z) c(0, cumsum(z)) / sum(z)))
y <- y.top - dy * c(0, cumsum(d$Root/sum(d$Root) + gap/n)) / (1+gap)

invisible(sapply(1:n, function(i) {
  lines(x=c(x.0+1/4, x.right), y=rep(dy*gap/(2*n)+(y[i]+y[i+1])/2, 2),
        lty=3, col="#e0e0e0")
  sapply(1:m, function(j) {
  mid <- (x.left - (u[i,3] + u[i,4])/2)*dx
  rect(mid + u[i,j]*dx, y[i+1] + (gap/n)*(y.top-y.bottom), 
       mid + u[i,j+1]*dx, y[i], 
       col=colors[j], border=NA)
})}))
abline(v = x.left, col="White")
labels <- mapply(function(s,n) paste0(s, " (", n, ")"), rownames(x), d$Total)
text(x.0, (y[-(n+1)]+y[-1])/2, labels=labels, adj=c(1, 0), cex=0.8,
     col="#505050")