Solved – the best way to visualize relationship between discrete and continuous variables

categorical datadata visualizationrandom variable

What is the best way to show a relationship between:

  • continuous and discrete variable,
  • two discrete variables ?

So far I have used scatter plots to look at the relationship between continuous variables. However in case of discrete variables data points are cumulated at certain intervals. Thus the line of best fit might be biased.

Best Answer

Below: The original plot may be misleading because the discrete nature of the variables makes the points overlap:

enter image description here

One way to work around it is to introduce some transparency to the data symbol:

enter image description here

Another way is to displace the location of the symbol mildly to create a smear. This technique is called "jittering:"

enter image description here

Both solutions will still allow you to fit a straight line to assess linearity.

R code for your reference:

x <- trunc(runif(200)*10)
y <- x * 2 + trunc(runif(200)*10)
plot(x,y,pch=16)
plot(x,y,col="#00000020",pch=16)
plot(jitter(x),jitter(y),col="#000000",pch=16)