Solved – Plotting a frequency distribution with count and score (precision) for the frequenct

data visualizationggplot2scatterplot

I have a dataset of a million documents. I took the frequency distribution of the documents based on the number of words in it. I also have the precision results for each document. Now I want to show the average precision per frequency. How do I plot this? What kind of diagram can incorporate boht the frequncy and the precision for each count

My dataset sample

name, number of words,precision
doc1, 3, 0.3
dco2,4,0.2
doc3,3,0.1
doc4,3,0.2
doc5,5,0.5

Now what I need to plot is


freq., count, mean prec.
3, 3,0.2
4,1,0.2
5,1,0.5

Best Answer

I'm not sure what you prefer on the x-axis, but you could use freq. and count as x/y values. To also indicate the mean value, you may use geom_count to vary dot size depending on the mean value.

This is a very basic example, but it should give you an idea where you could go:

mydat <- data.frame(freq = c(3, 4, 5), count = c(3, 1, 1), mean.prc = c(.2, .2, .5))
library(ggplot2)
ggplot(mydat, aes(x = count, y = freq, size = mean.prc)) + geom_count()

enter image description here

You could of course switch your variables if you want count as dot size and mean.prc on any axis etc.

Edit: To adjust the geom-size, you could tweak the scale and start with a zero as lower limit for dot-size proportion, e.g.:

ggplot(mydat, aes(x = count, y = freq, size = mean.prc)) + geom_count() +
  scale_size_continuous(limits = c(0,.5))

enter image description here

Related Question