Solved – Can a standard deviation of raw scores be reported as a standard deviation of percentages

meanpercentagestandard deviation

Suppose we have a test consisting of 30 questions, and 10 people take this test. The mean test score of these 10 people is 17, and the standard deviation of all the scores in the sample is 4. When reporting the descriptive statistics at school, we use these raw scores and write (M=17, SD=4); but in some cases I have the feeling that reporting percentages would be better. Because I think that we have a more intuitive grasp of what it means to score 56.7 over 100 than to score 17 over 30 (probably because we are accustomed to the decimal system).

So, for the example given above, would it be possible to report the mean and standard deviation as (M=56.7%, SD=13.3%)?

Does it make sense to say that the exam scores in a sample have the standard deviation of 13.3%?

These percentages are the arithmetic equivalent of the raw scores I made up and given above, but I am not sure whether it is good practice to directly convert them into percentages like that.

Best Answer

The standard deviation is just a statistical property that you can measure for a set of data points. The standard deviation does not itself make any assumptions that your data is normally distributed or has/has not passed through any transformations, linear or otherwise.

Therefore, it's perfectly acceptable to use the standard deviation on any data, including the percentage scores.

Note that, in your particular case, the transformation you are applying is a linear transform, of the form:

$$ y = Ax + b $$

i.e. an affine transform. So you can calculate the standard deviation on the original, untransformed data and then multiply by A to get the standard deviation after the transform. There seems to be no particular advantage to doing this rather than simply calculating the standard deviation on the already transformed data, but it might be reassuring.

We can see that an affine transformation will transform the standard deviation linearly by $A$, as follows:

Given we have input data $\{X_1, X_2, ..., X_n\}$, the original standard deviation, $\sigma$, will be given by:

$$ \sigma_X^2 = \frac{1}{n}\sum_{i=1}^n \left(X_i - \frac{1}{n}\sum_{j=1}^n X_j\right)^2 $$

Let's apply the transform $Y = AX + b$. Then we have

$$ \sigma_Y^2 = \frac{1}{n}\sum_{i=1}^n \left( AX_i + b - \frac{1}{n} \sum_{j=1}^n \left( AX_j + b \right) \right)^2 $$

$$ = \frac{1}{n}\sum_{i=1}^n \left( AX_i + b - n\frac{1}{n}b - \frac{1}{n} \sum_{j=1}^n \left( AX_j \right) \right)^2 $$

$$ = \frac{1}{n}\sum_{i=1}^n \left( AX_i - \frac{1}{n} \sum_{j=1}^n \left( AX_j \right) \right)^2 $$

$$ = A^2 \left( \frac{1}{n}\sum_{i=1}^n \left( X_i - \frac{1}{n} \sum_{j=1}^n \left( X_j \right) \right)^2 \right) $$

$$ = A^2 \sigma_X^2 $$

Therefore

$$ \sigma_Y = A \sigma_X. $$