Solved – Given arithmetic mean and standard deviation for a frequency distribution, the actual class intervals are required

meanself-studystandard deviation

The Text I am reading has a question that goes like this :

Explain clearly the ideas implied in using arbitrary working origin and scale for the calculation of arithmetic mean and standard deviation of a frequency distribution. The values of the arithmetic mean and standard deviation of the following frequency distribution of a continuous variable derived from the analysis in the above manner are $40.604\,\text{lb}$ and $7.92\,\text{lb}$. Determine the actual class intervals.

$$\begin{array}{crrrrrrrr}
x: & -3 & -2 & -1 & 0 & 1 & 2 & 3 & 4 \\
f: & 3 & 15 & 45 & 57 & 50 & 36 & 25 & 9
\end{array}$$

Please correct me if I am wrong, I think that the change in scale and origin would be required if some of the meaningless values have to be removed from the distribution, like in the distribution above the unit we are considering is $lb$ which is related to weight and hence values $-3,-2,-1,0$ are problematic. Hence, the transformation $y_i = 5-x_i$ would make all the values greater than equal to one, but still what about the class intervals ?

Can anyone help ?

Best Answer

An outline of the solution may be given as follows:

Let $y_{i}$ denote the mid-values of the original grouped frequency table, $A$ be the arbitrarily chosen origin, and $C$ denote the class-width. Considering the transformation $\dfrac{y_{i}-A}{C}$ for such a table, we obtained $$\begin{array}{c|cccccccc} x_{i}&-3&-2&-1&0&1&2&3&4\\ \hline f_{i}&3&15&45&57&50&36&25&9 \end{array}$$

The mean of the above table can be computed using the formula

\begin{eqnarray*} \bar{X}&=&\dfrac{\sum f_{i}x_{i}}{\sum f_{i}}=\dfrac{1}{\sum f_{i}}\sum f_{i}\left(\dfrac{y_{}-A}{C}\right)=\dfrac{1}{C}\left( \dfrac{\sum f_{i}y_{i}}{\sum f_{i}}-A \right)\\ \bar{Y}&=& C\bar{X} + A \qquad\cdots\qquad (i) \end{eqnarray*} Equation $(i)$ involves two unknowns $C$ and $A$.

%We know that the variance is independent of change of origin and but not scale.

Let \begin{equation*} y_{i}- A = C x_{i} = d_{i} \end{equation*}
Variance of original data $y_{i}-A$= $d_{i}$ is defined by \begin{eqnarray*} \dfrac{\sum f_{i}d_{i}^{2}}{\sum f_{i}}-\left(\dfrac{\sum f_{i}d_{i}}{\sum f_{i}} \right)^2 &=& \dfrac{C^2\sum f_{i}x_{i}^{2}}{\sum f_{i}}-\left(\dfrac{C\sum f_{i}x_{i}}{\sum f_{i}} \right)^2\\ SD_{y} &=& C \sqrt{ \dfrac{\sum f_{i}x_{i}^{2}}{\sum f_{i}}-\left(\dfrac{\sum f_{i}x_{i}}{\sum f_{i}} \right)^2}\qquad\cdots\qquad (ii) \end{eqnarray*} where the quantity under the radical sign is the $SD$ of the transformed data, given in the table above. As the standard deviation $SD_y$ of the data is given, equation $(ii)$ involves only one unknown, that is $C$. So, from the given data table just compute the $SD$ and plug in the known $SD_y$ value son the LHS to obtain $C$. Use this $C$ value in above equation $(i)$ for mean to obtain the other unknown $A$.