Solved – Plotting a frequency distribution with count and score (precision) for the frequenct

data visualizationggplot2scatterplot

I have a dataset of a million documents. I took the frequency distribution of the documents based on the number of words in it. I also have the precision results for each document. Now I want to show the average precision per frequency. How do I plot this? What kind of diagram can incorporate boht the frequncy and the precision for each count

My dataset sample

name, number of words,precision
doc1, 3, 0.3

Now what I need to plot is

freq., count, mean prec.
3, 3,0.2

Best Answer

I'm not sure what you prefer on the x-axis, but you could use freq. and count as x/y values. To also indicate the mean value, you may use geom_count to vary dot size depending on the mean value.

This is a very basic example, but it should give you an idea where you could go:

mydat <- data.frame(freq = c(3, 4, 5), count = c(3, 1, 1), mean.prc = c(.2, .2, .5))
ggplot(mydat, aes(x = count, y = freq, size = mean.prc)) + geom_count()

enter image description here

You could of course switch your variables if you want count as dot size and mean.prc on any axis etc.

Edit: To adjust the geom-size, you could tweak the scale and start with a zero as lower limit for dot-size proportion, e.g.:

ggplot(mydat, aes(x = count, y = freq, size = mean.prc)) + geom_count() +
  scale_size_continuous(limits = c(0,.5))

enter image description here

Related Question