Solved – Plot graph with more than one value on x and y-axis in R

data visualizationr

I have a dataset containing two columns and a total of 90 rows. The data is from my experiment where in the first column I have an integer representing the quantity, in the second column I have a percentage. A small example:

Quantity   Percentage
1          53%
1          51%
1          67%
2          73%
2          69%
3          73%
...        ...

As you can see in both columns the numbers can occur more than once. Now I wish to plot this in a graph (I was thinking a scatter plot) in R. I just am a real beginner in using R and statistics so I was hoping someone can help me out how to get a good graph. If someone has an other suggestion that would give a better representation, shoot!

I just need to have a visual representation that shows the correlation between the two values.

Best Answer

If the percentages are ratios of counts, I agree with whuber's concern about the proportions, so it would be good if you could confirm if that's the case.

As a matter of data visualization, you're dealing with coincident points (a multiplicity of points at some locations) where there's a need to show those points.

Here's an example with 30 points, where you only see 23 because the remaining 7 lie on top of earlier points:

coincident x and y

There are numerous techniques for plotting such overlaid points.

  1. Jittering.

    Points can have a small amount of random noise added to the x and y values so they become slightly offset from each other

    coincident x and y with jittering

    We can suddenly see there's quite a few points at $(3,3)$ that were not obvious before; this changes the impression of where the centers of the two variables lie.

    A similar approach can be seen for ordered categorical variables here

  2. Plotting with transparency

    If points are plotted with a transparency (alpha) level, a single point looks "faint" while multiple points in one position look more solid, making the greater density of points obvious by a greater density of color.

    overplotting with transparency

    (here generated with plot(xx,yy,col=rgb(0,100,0,70,maxColorValue=255), pch=16))

    [Added in later edit: I somehow seem to have changed my example data after this point. I am not sure how it occurred, but it doesn't especially matter except for the fact that the later plots aren't quite identical to the earlier ones. I am not going to regenerate them all as it doesn't alter the ideas.]

  3. Symbols to indicate multiplicity

    You can plot symbols that directly indicate the value in some way, and through size and weight of symbols attempt to give a rough second impression of the relative density. Here are some that might be used.

    list of symbols of each count

    So for our data:

    symbols indicating overplot count

    A very simple version of that approach is to simply plot a count of the multiplicity ("1", "2", "3" etc). It's very easy to do but it doesn't really convey the visual impression well, and I decided not to include the example, but I can put it up if anyone cares.

    More sophisticated versions of this approach can be implemented, such as sunflower plots (see, for example, ?sunflowerplot in R):

    sunflower plot

    The advantage of the sunflower plot is it's a bit more automatic to do, and it can handle high multiplicity without fiddling about with symbols.

  4. Stacking (This one was suggested by Nick Cox in comments)

    enter image description here

    While it might run into problems if there were a large range of values on the x-axis (so the space between them might be too small to accommodate a high multiplicity of points), I think this works fairly well for my example data. It should be possible to squeeze the points up a bit more/draw them smaller, and so fit a slightly higher multiplicity in. In cases where there were mostly multiplicity of 1, 2 or 3, I think this is a highly competitive approach - it came out better than I thought.

  5. Using area to convey point multiplicity

    area of point shows multiplicity

    Here again, amount of ink indicates number of points (by making symbol size $\propto\sqrt n$).