Solved – Creating a plot with boxplots ranked by quantiles in R

boxplotdata visualizationquantilesr

I am trying to create a plot in R which gives me boxplots (and/or distributions) of one column ranked by quantiles (and/or equally spaced groups) of another column in a dataframe.

An example would be the following plot:

enter image description here

Can anybody give me references for packages or sites where the creation is explained?

Best Answer

Here is a possible solution using base R graphics:

n <- 1000
x <- runif(n, 0, 100)
y <- 1.1*x + rnorm(n)
library(Hmisc)
xq <- cut2(x, g=10, levels.mean=TRUE)
ym <- tapply(y, xq, mean)
# display the mean for each decile
plot(as.numeric(levels(xq)), ym, pch="x", xlab="x", ylab="y")
# add the boxplots
boxplot(y ~ xq, add=TRUE, at=as.numeric(levels(xq)), axes=FALSE)
abline(v=cut2(x, g=10, onlycuts=TRUE))

If data are in a data.frame, just add a data= argument when calling boxplot(). You can play with the boxwex argument to increase box plots widths. If you prefer to stick on the default cut() function, you can probably parse right values of the deciles as in the code below (surely there's a cleaner way to do that!):

xq <- cut(x, quantile(x, seq(0, 1, by=.1)))
vx <- gsub("\\(", "", unlist(strsplit(levels(xq), ","))[seq(1, 18, by=2)])

enter image description here

A simple ggplot solution might look like this:

xy <- data.frame(x=x, y=y)
ggplot(xy, aes(x, y, group=xq)) + geom_boxplot() + xlim(0, 100)

I don't know of any package for "decile plots", but I would like to recommend the bpplt() and panel.bpplot() from the Hmisc package. E.g., try this

library(lattice)
bwplot(xq ~ y, panel=panel.bpplot, probs=.25, datadensity=TRUE)
Related Question