[Math] How to find the probability range/scale for a histogram

graphing-functionsprobabilitystatistics

I have a histogram showing the distribution of reaction times from $100$ trials worth of data. The range in times is measured in ms and ranges from $70$ ms to $420$ ms. The frequency is displayed on the left y-axis with the max peaking at $28$ occurrences in the $175-210$ ms bin range. The bin sizes, as you could guess, are in $35$ ms sized boxes.
I have to add a probability y-axis and a probability density y-axis to the same graph, but I'm not sure how to calculate the probability to see how high in value the axis should go.
My lab describes calculating this amount by dividing the scale of the "first axis" by the total number of measurements of my histogram.

I thought it would simply by $35$ for the scale of the "first axis", which I'm assuming is the x-axis, divided by $100$, the number of trials I conducted, but when I start to calculate the probability density, I have to divide the scale for probability by the interval width.

So basically I have to solve the first to solve the second. The problem is I don't know what is the difference between the scale of the first axis and the interval width.

With the assumption of $35/100 = 0.35$ becomes my max for the probability axis, but this doesn't exactly make sense because then the next equation would just be $0.35/35$, which means I'm calling the scale of the first axis the same thing as the interval width.

Could anyone provide some clarification on how I should identify what is the first axis, and how do I find its scale? What's the difference from the interval width?

Best Answer

Assuming that the different scales are multiples of each other and that the original plot has a vertical axis in the count scale, just use the definitions of the probability and probability density scales.

Let the sample size be $n$ and the values for each bin are $c_i$, $p_i$, and $d_i$ (for $i=1,2,\ldots,n$) for the count, probability, and probability density scale. That each are multiples of the other we can write $p_i=w c_i$ and $d_i=v c_i$.

We know that the sum of the probabilities and the sum of the areas are 1:

$$\sum_{i=1}^n p_i=\sum_{i=1}^n w c_i=w \sum_{i=1}^n c_i=w n=1$$ $$\sum_{i=1}^n 35 d_i=\sum_{i=1}^n 35 v c_i=35v \sum_{i=1}^n c_i=35 v n=1$$

So $w=1/n$ and $v=1/(35n)$.

The maximum count occurs at 28 so "nice" tick marks can occur at 0, 5, 10, 15, 20, 25, and 30. The maximum probability occurs at $28 w=28/100=0.28$ so "nice" tick marks are 0.00, 0.05, 0.10, 0.15, 0.20, 0.25, and 0.30. The maximum probability density occurs at $28 v=28/(35*100)=0.008$ so "nice" tick marks will be 0.000, 0.001, ..., 0.009.

Now we find the corresponding counts to those sets of tick marks. That will be 0, 5, 10, 15, 20, 25, 30 for the probability axis and 0, 3.5, 7, 10.5, 14, 17.5, 21, 24.5, 28, 31.5 for the probability density axis.

Using commands from R one has

par(mai=c(1,1,1,1.2), xpd=TRUE)
plot(c(0,775), c(0,32), type="n", xlab="", ylab="Count", 
  axes=FALSE, las=1)
axis(1, c(0:4)*100, pos=0)
text(200, -5,"Reaction time (ms)")
axis(2, las=1, pos=0)
axis(4, c(0:6)*5, c("0.00", "0.05", "0.10", "0.15", "0.20", "0.25", "0.30"),
  las=1, pos=450)
text(550, 15, "Probability", srt=90)
axis(4, c(0, 3.5, 7, 10.5, 14, 17.5, 21, 24.5, 28, 31.5), 
  c("0.000", 0.001, 0.002, 0.003,0.004, 0.005, 0.006, 0.007, 0.008, 0.009),
  las=1, pos=650)
text(775, 15, "Probability density", srt=90)

Best Answer

Related Solutions

[Math] the correct way to plot histogram

[Math] How to “relative frequency histogram” become a “probability density curve”

Related Question