Solved – Aggregating standard deviation to a summary point

aggregationdescriptive statisticsstandard deviationvariance

I have a range of data (server performance statistics) is formatted as follows, for each server:

Time            | Average |  Min  |  Max  | StdDev  | SampleCount |
-------------------------------------------------------------------
Monday 1st      |    125  |   15  |  220  | 12.56   |     5       |
Tuesday 2nd     |    118  |   11  |  221  | 13.21   |     4       |
Wednesday 3rd   |    118  |   11  |  221  | 13.21   |     3       |
....            |    ...  |   ..  |  ...  | .....   |     .       |
and so on...

These data points are calculated from data that has a finer resolution (e.g. hourly data).

I need to aggregate this data into a single summary point so the end result is a list of servers and an aggregate average, min, max, standard deviation.

For average, I take the average of all the averages.
For min, we take the minimum min.
For max, we take the maximum max.

However, I'm not sure what method I should be using to aggregate standard deviation? I've seen various answers including square roots and variance but I really need a concrete answer on this – can anyone help?

Best Answer

First, the best way to do any of this is to use the raw data

If you don't have the raw data then

Second, for the average you should not simply average all the averages. You need to weight by sample size. In R, e.g

group1 <- c(2, 10, 12)
group2 <- c(4, 10, 15, 50)
m1 <- mean(group1)
m2 <- mean(group2)
(ave.unweight <- (m1 + m2)/2)  #13.875  ... incorrect
(ave.weight <- (m1*3 + m2*4)/7) #14.71
(ave.true <- mean(c(group1, group2))) #14.71

Third, for the SD see this thread