[Math] Mode of a frequency distribution with unequal class length

statistics

How can I find the mode for a grouped frequency distribution with unequal class lengths? I have to find the mode for the following problem:

\begin{array}{c|c}
\text{Marks} & \text{# of Students} \\ \hline
\text{0 – 20} & 32 \\ \hline
\text{20 – 50} & 45 \\ \hline
\text{50 – 70} & 15 \\ \hline
\text{70 – 100} & 8 \\ \hline
\end{array}$$

For equal class lengths, we use the formula
$$\text{Mode} = l+\frac{(f_0-f_{-1})}{2f_0-f_{-1}-f_{+1}}W_o$$
where
$l$ is the lower class boundary of the modal class,
$f_0$ is the frequency of the modal class,
$f_{-1}$ is the frequency preceding the modal class,
$f_{+1}$ is the frequency following the modal class,
$W_{o}$ is the class width of the modal class

But how to proceed for the above example?

Best Answer

Here is an outline of what I intend to do:

(1) 'Reconstruct' the original data by using R to spread the observations in each interval at random within the interval. Here is a density histogram (intervals of equal length) of one such reconstruction.

 x = c(runif(32,0,20),runif(45,20,50),runif(15,50,70),runif(8,70,100))
 hist(x, prob=T, col="wheat")

enter image description here

(2) Use a modern density estimator to 'smooth' this histogram, and determine the location of the highest point of the density estimator, which is a reasonable estimate of the mode of the reconstructed data. For this reconstruction, the mode is 22.4.

 hist(x, prob=T, col="wheat")
 lines(density(x), col="blue")
 dxy = density(x);  dx = dxy$x; dy = dxy$y # (x,y) components of 'smooth'
 dx[dy == max(dy)]  # x-value at which 'smooth' has its max
 ## 22.36885   # estimated density

enter image description here

(3) Of course, each random reconstruction of the data will be somewhat different. Repeat steps (1) and (2) 2000 times and keep track of the 2000 modes produced. The median of these estimated modes was 23.6. Take this value to be a reasonable estimator of the mode of the distribution from which the original data were sampled.

However, these estimated modes where quite variable (mainly because so much information was lost in the original summary of the data into four groups of unequal lengths). Below is a boxplot of the 2000 mode estimates. (Note: The histogram and density-estimator curve in the figure above happen to be for the last of the 2000 reconstructions of the data in my simulation.)

I doubt that this is anything like the method you were expected to use, but I believe this is a responsible approach to solving the problem. (Certainly better than the approaches I initially suggested in my Comment an hour ago. Maybe I should delete the Comment now, but that seems like cheating.)

enter image description here