Solved – A bar graph and its look – should I add titles, where should values go etc

data visualizationggplot2r

I didn't use r but I have recently decided to use it to plot graphs because of its great capability to do so. 'd like to plot a graph that shows voter turnout in elections. I know little about proper (i.e. correct in regard of their look) graphs so I found in the Internet one that looks (to me) fine. Here's this graph:

enter image description here

Here's my graph. You can see that it is far from being good (screenshot is from R-Fiddle and I think the graph may look different when compiled from desktop R).

enter image description here

What should I do to make it better (e.g. readible)? Specifically, do I need titles for x and y axis? Does it look better if the values are on top of bars (like in my graph) or within bar (the graph found in the Internet)?

My code is:

# Load packages
library(ggplot2)
library(scales)
# Create dataset
dat <- data.frame(years = c("1991", "1993", "1997", "2001", "2005", "2007", "2011", "2015"),
freq = c(43.20, 52.13, 47.93, 46.29, 40.57, 53.88, 48.92, 50.92))
# Create graph bar
ggplot(dat, aes(years, freq)) +   geom_bar(stat = "identity", width=0.55)
+ geom_text(aes(label=comma(freq), y=freq+1.1))  
+ scale_y_continuous(breaks = seq(0,50,10)) +  theme_classic()

EDIT:

I tried to incorporate as many suggestions in comments and answers as I could. I've come up with the following:

enter image description here

What do you think of it?

Best Answer

I agree with EdM's point that "bar plots simply have too much ink for the information conveyed." Here's a ggplot2 version of his answer:

library(ggplot2)

df <- data.frame(years=c(1991, 1993, 1997, 2001, 2005, 2007, 2011, 2015),
                 freq=c(43.20, 52.13, 47.93, 46.29, 40.57, 53.88, 48.92, 50.92))

p <- (ggplot(df, aes(x=years, y=freq)) +
      geom_line(size=1.25, color="#999999") + geom_point(size=3.5, color="black") +
      theme_bw() +
      theme(panel.border=element_blank(), panel.grid.minor=element_blank(),
            axis.title.y=element_text(vjust=1.25)) +
      scale_x_continuous("", breaks=seq(1990, 2015, 5), minor_breaks=NULL) +
      scale_y_continuous("percentage turnout", limits=c(36, 59),
                         breaks=seq(40, 55, 5), minor_breaks=NULL))
p
ggsave("percentage_turnout_over_time.png", p, width=10, height=8)

Which produces this:

ggplot2 graph

Edit: here's a version with numbers on the graph:

p <- (ggplot(df, aes(x=years, y=freq, label=freq)) +
      geom_line(size=1.25, color="#999999") + geom_point(size=3.5, color="black") +
      geom_text(vjust=c(2, -1, -1.5*sign(diff(diff(df$freq))) + 0.5)) +
      theme_bw() +
      theme(panel.border=element_blank(), panel.grid.minor=element_blank(),
            axis.title.y=element_text(vjust=1.25)) +
      scale_x_continuous("", breaks=seq(1990, 2015, 5), minor_breaks=NULL) +
      scale_y_continuous("percentage turnout", limits=c(36, 59),
                         breaks=seq(40, 55, 5), minor_breaks=NULL))
p
ggsave("percentage_turnout_over_time_with_text.png", p, width=10, height=8)

ggplot2 graph second version

Nick Cox's comment under the original post is convincing:

I see no harm in showing numbers too. People often want to read numbers off graphs just as they (should) want to read numbers off tables. Also, offering graph PLUS table in a paper would often be rejected by reviewers as too much space devoted to the same information, so hybridising graph and table is perfectly defensible.

Related Question