Solved – treat the mean of a set of z-scores as a z-score

z-score

I have a set of z-scores corresponding to different tests taken by the same subjects. Can i take the average of the z-scores for each subject and compare the average z-scores as it was actual z-scores? (i.e. can I calculate a percentile for each subject based on the average of the set of z-scores?)

EDIT:

My goal is to calculate percentiles for a subject based on a set of z-scores for that subject. So far my approach has been to take the average of the z-scores of a subject, and then treat that average as a z-score and calculate the percentile based on that. I wonder if there is any problem with that approach?

Best Answer

Maybe someone else can explain the math behind it, but consider this quick demonstration: I generate five vectors, each 100 numbers long. Each of these vectors is on a different scale, so I standardize them (i.e., create z-scored variables). That is, the mean is zero and the standard deviation is 1 for each of these five latent construct variables:

set.seed(1839)

## create five different z-score variables that represent latent constructs
data <- data.frame(
  latent_construct_1 = scale(rnorm(100, 10, 4)),
  latent_construct_2 = scale(rnorm(100, 3, 18)),
  latent_construct_3 = scale(rnorm(100, -5, 7)),
  latent_construct_4 = scale(rnorm(100, 0, 8)),
  latent_construct_5 = scale(rnorm(100, 20, 20))
)

Let's check to make sure they are actually z-scores:

> sapply(data, mean)
latent_construct_1 latent_construct_2 latent_construct_3 latent_construct_4 latent_construct_5 
     -2.203951e-16       1.634435e-17       1.400464e-17      -1.449145e-17       7.852226e-17 
> 
> sapply(data, sd)
latent_construct_1 latent_construct_2 latent_construct_3 latent_construct_4 latent_construct_5 
                 1                  1                  1                  1                  1 

So, now let's say we average all five of these together:

## make a mean of all of these latent constructs
data$mean_latent_construct <- rowMeans(data)

Is this new variable a z-score? We can check to see if the mean is zero and standard deviation is one:

> ## is the mean zero?
> mean(data$mean_latent_construct)
[1] -2.436148e-17
> 
> ## is the standard deviation one?
> sd(data$mean_latent_construct)
[1] 0.4599126

The variable is not a z-score, because the standard deviation is not one. However, we could now z-score this mean variable. Let's do that and compare the distributions:

## z-score the mean latent construct
data$mean_latent_construct_z <- scale(data$mean_latent_construct)

## compare distributions
library(tidyverse)
data <- data %>% 
  select(mean_latent_construct, mean_latent_construct_z) %>% 
  gather(variable, value)

ggplot(data, aes(x = value, fill = variable)) +
  geom_density(alpha = .7) +
  theme_light()

enter image description here

The z-scored aggregate variable of z-scores looks a lot different from the aggregate variable of z-scores.

In short: No, a mean of z-scored variables is not a z-score itself.