Solved – What’s a good way to use R to make a scatterplot that separates the data by treatment

data visualizationrscatterplot

I'm very new with R and stats in general, but I need to make a scatterplot that I think might be beyond its native capacities.

I have a couple of vectors of observations and I want to make a scatterplot with them, and each pair falls into one out of three categories. I would like to make a scatterplot that separates each category, either by colour or by symbol. I think this would be better than generating three different scatterplots.

I have another problem with the fact that in each of the categories, there are large clusters at one point, but the clusters are larger in one group than in the other two.

Does anyone know a good way to do this? Packages I should install and learn how to use? Anyone done something similar?

Thanks

Best Answer

large clusters: if overprinting is a problem, you could either use a lower alpha, so single points are dim, but overprining makes more intense colour. Or you switch to 2d histograms or density estimates.

require ("ggplot2")

ggplot (iris, aes (x = Sepal.Length, y = Sepal.Width, colour = Species)) + stat_density2d ()

You'd probably want to facet this...
ggplot (iris, aes (x = Sepal.Length, y = Sepal.Width, fill = Species)) + stat_binhex (bins=5, aes (alpha = ..count..)) + facet_grid (. ~ Species)

While you can procude this plot also without facets, the prining order of the Species influnces the final picture.
You can avoid this if you're willing to get your hands a bit dirty (= link to explanation & code) and calculate mixed colours for the hexagons:

Another useful thing is to use (hex)bins for high density areas, and plot single points for other parts:

ggplot (df, aes (x = date, y = t5)) + 
  stat_binhex (data = df [df$t5 <= 0.5,], bins = nrow (df) / 250) +
      geom_point (data = df [df$t5 > 0.5,], aes (col = type), shape = 3) +
  scale_fill_gradient (low = "#AAAAFF", high = "#000080") +
  scale_colour_manual ("response type", 
    values = c (normal = "black", timeout = "red")) + 
  ylab ("t / s")

enter image description here

For the sake of completeness of the plotting packages, let me also mention lattice:

require ("lattice")

xyplot(Sepal.Width ~ Sepal.Length | Species, iris, pch= 20)
xyplot(Sepal.Width ~ Sepal.Length, iris, groups = iris$Species, pch= 20)
xyplot(Sepal.Width ~ Sepal.Length | Species, iris, groups = iris$Species, pch= 20)

Best Answer

Related Solutions

Solved – Graph for relationship between two ordinal variables

Solved – Interpretation of survival risk predictions generated from R’s predictSurvProb

Related Question