[Math] Average, standard deviation and min/max values

averagestandard deviationstatistics

I'm analzying a computer science paper and just found in the experimental setup the following statement:

  • Average (standard deviation) of number of files per peer: 464 (554)
  • Min – max number of files per peer: 100 – 4,774

Are these numbers possible at all? It does not say anything of a normal distribution, but how is it possible that the standard deviation is 554, but the min number of files per peer is 100?

Best Answer

Suppose we have $n$ numbers $x_1, x_2, \ldots, x_n$ ranging from $a$ to $b$, meaning that $x_i = a$ for at least one $i$, $1 \leq i \leq n$, and $x_j = b$ for at least one $j$, $1 \leq j \leq n$. Duplicates are allowed, meaning that two or more of the $n$ numbers could possibly have the same value in $[a, b]$. Define the mean $\bar{x}$ and the variance $\sigma_x^2$ of the set of $n$ numbers as $$\bar{x} = \frac{1}{n}\sum_{i=1}^n x_i, ~~ \sigma_x^2 = \frac{1}{n}\sum_{i=1}^n (x_i - \bar{x})^2 = \left(\frac{1}{n}\sum_{i=1}^n x_i^2\right) - \bar{x}^2.$$ For convenience, let us change the set to $y_1, y_2, \ldots, y_n$ where $y_i = x_i - a$, so that the new set has values ranging from $0$ to $b-a$. The mean $\bar{y}$ is just $\bar{x}-a$, while the variance is unchanged: $\sigma_y^2 = \sigma_x^2$. Now, it is shown in the answers to this question on stats.SE that the ratio $\sigma_y/\bar{y} = \sigma_x/(\bar{x}-a)$ can be no larger than $\sqrt{n-1}$. Note that the value does not depend on $b$ at all You don't say what the value of $n$ is, but given the numbers in your answer, the upper bound on the standard deviation is $$\sigma_x = 554 \leq (464-100)\sqrt{n-1}$$ which is certainly satisfied except in the unusual circumstance that there are only $3$ peers in the experiment and thus only $3$ numbers $x_1$, $x_2$, and $x_3$ are being described in terms of mean/standard-deviation/min-max: certainly overkill!

In summary, there are no obvious problems with the standard deviation being larger than the mean.