Solved – Best way to visualize scatterplot with thousands of points in a grayscale-friendly way

data visualization

I have 10,000 data points like shown in this plot:
It's comparing the running time of some piece of code with the size of the problem it's running on.
(There are 2 important steps in the code; step 1's running time is in blue and step 2's is in green.)

I'm hoping to keep this grayscale-friendly, because I'm hoping to publish this and it may end up being in grayscale.

I'm trying to figure out how to best visualize this data. Currently I'm thinking it may be best to perform kernel density estimation in log-scale and just plot a smooth surface, but I'm not sure… is there a better way to visualize it clearly?

Best Answer

A log-log plot will spread the points out quite a bit.

If your thesis is correct the data should tend to lie close to/parallel to a 45 degree line through a typical point - say (x-median,y-median).

Having seen your log-log scale plot in the comments, a greyscale would be a problem because the overlap of the point clouds is so substantial even on the log scale. With color you can use transparency but that's difficult on greyscale.

So for that issue, consider a pair of graphs, each with a LOESS curve (as well as the suggested reference line), and each also with the LOESS curve from the other plot as a dashed curve for ready comparison.

Related Question