Solved – How to remove the z-order bias of a coloured scatter plot

data visualizationscatterplot

I'm currently working on a plot engine for my project. This engine should be robust for a wide range of inputs. In order to analyse the data, I'm plotting a series of graphs utilising python/matplotlib. Among them is the following:

scatter plot

I think this graph is not good because the data being plotted first (high pressures, red) have a lower z-order (i.e. they are overdrawn) than the blue bullets for low pressures. Thus introducing a bias when looking at the graph. The underlying reason for that is that the data is bell-shaped.

First off, do you agree or disagree? I could leave it like it is because it is just one of many views on the data. It could still be useful.

However, if there is a way to make this graph better with some sort of trick, I'd be much happier. I already played with point size, transparency/alpha and edgecolor. This only made it worse. A great way to remove the z-order bias in scatter plots is to bin the data and colour-code it accordingly (e.g. hexbin). But since I used the colour for the pressure information, I see no possibility to to something similar.

Another idea would be to randomise the z-order, but I'm not sure how to do that and if the result would be better.

Any other comments for improvements are appreciated.

Best Answer

First off, I agree.

I suspect that you can create a different sort of graph; you're not using a lot of the two-dimensionality of the current display because everything is clustered about the x=y line. Try plotting the pressure along the x axis and the ratios along the y axis. If this is too messy, try taking the difference in pressure. You could also use some measure of effect size, like Cohen's d, but then viewers would have to know what that is. You can probably come up with something better than what I suggested, but my suggestion might help you think of other approaches. As you'll read below, my approach might mislead viewers because it would make pressure look like an independent variable.

It would help to know what sort of story you're telling from this graph. My interpretation is that the ratios are independent variables and the pressure is a dependent variable. The change that I suggested above makes it look like the pressure is independent and the ratios are dependent. (That might not be a problem.)

But here are a ideas that use your current graph.

  • Sorting a list randomly in python
  • It looks like the pressures might be clustered a bit. I'm not sure whether this is what you were saying was bell-shaped. But if they are clustered, you could try assigning different dot types to each of a small number of clusters
  • For each of the axes, plot a histogram of that variable with the pressure colors stacked on top of each other. Even if you don't change the main three-variable plot, these two-variable modified histograms would help point out the bias in the display.