Z Scores – Combining by Weighted Average: Sanity Check

quantilesz-score

I'm trying to measure how "exceptional" a particular observation is based on several attributes of that observation among a population of observations.

Each observation has several attributes, all numerical quantities but on different scales. To normalize these attributes to be on the same scale, I calculate the Z score of each attribute for each observation.

Then, I combine the attributes together of one observation using a weighted average.

For example, if I believe attribute 1 is twice as important as attribute 2, so attribute 1 will have a weight of two and attribute 2 will have a weight of one. I do the same for all observations.

I call the result of the weighted average an "observation Z Score." I interpret an observation with a Z Score of 1 as being more "exceptional" than 84% of other observations.

Sanity check questions

  • Is this method ok?
  • Is the method to weight an attribute twice as important with a weight of 2 appropriate for Z Scores?
  • Is my interpretation of observation Z Score appropriate?

Update:

Since there is a problem with Z Scores, what if I used the data's percentile (the % of observations that have attributes lower than it) instead of it's Z score? This way, there are no normality assumptions and the obligation to keep the variance the same.

To elaborate:

Each observation has several attributes, all numerical quantities but on different scales. To normalize these attributes to be on the same scale, I calculate the percentile of each attribute for each observation.

Then, I combine the attributes together of one observation using a weighted average. All weights sum to 1.

For example, if I believe attribute 1 is twice as important as attribute 2, so attribute 1 will have a weight of two and attribute 2 will have a weight of one. I do the same for all observations.

I call the result of the weighted average an "observation percentile." I interpret an observation with a percentile of 0.60 as being more "exceptional" than 60% of other observations.

Sanity check questions

  • Is this method ok?
  • Is the method to weight an attribute twice as important with a weight of 2 appropriate for percentiles?
  • Is my interpretation of observation percentile appropriate?

Best Answer

The concept you're calling 'exceptionality' is simply a combined variable (via a weighted average) from two or more variables standardized to a Z-score. If there were a way of observing 'exceptionality' as sampled data, you could potentially fit a (standardized) multiple regression with your variables to find the best weights to use.


Let's consider two random variables $A$ and $B$, which are standardized to $Z_A$ and $Z_B$ respectively (meaning each follows a standard normal distribution, i.e. mean of 0 and variance of 1).

The weighted average of $Z_A$ and $Z_B$, where $w_A$ and $w_B$ are the respective weights for $Z_A$ and $Z_B$, is then: $$ W = \frac{w_A}{w_A+w_B} \cdot Z_A + \frac{w_B}{w_A+w_B} \cdot Z_B $$

Note that $w_A$ and $w_B$ are constants, whereas $Z_A$ and $Z_B$ are random variables.

Therefore, the expected value of $W$ is as follows: $$ \text{E}(W) = \frac{w_A}{w_A+w_B} \cdot \text{E}(Z_A) + \frac{w_B}{w_A+w_B} \cdot \text{E}(Z_B) = 0 $$

The variance of $W$, assuming the independence of $Z_A$ and $Z_B$, is: $$ \text{Var}(W) = \left(\frac{w_A}{w_A+w_B}\right)^2 \cdot \text{Var}(Z_A) + \left(\frac{w_B}{w_A+w_B}\right)^2 \cdot \text{Var}(Z_B) \\ = \left(\frac{w_A}{w_A+w_B}\right)^2 + \left(\frac{w_B}{w_A+w_B}\right)^2 $$

The variance of $W$, depending on the discrepancy between weights $w_A$ and $w_B$, must fall inside the interval $[.5,1)$. Although the mean is 0, because the variance is not 1, $W$ does not follow a standard normal distribution and therefore cannot be treated as a $Z$-score.

To make inferences like "a value of $W$ (the weight-averaged Z-scores) $= 1$ is greater than ~84% of observations" would involve having to standardize by dividing $W$ by its standard deviation. Therefore, the Z-score of $W$ becomes: $$ Z_W = \frac{\frac{w_A}{w_A+w_B} \cdot Z_A + \frac{w_B}{w_A+w_B} \cdot Z_B}{\sqrt{\left(\frac{w_A}{w_A+w_B}\right)^2 + \left(\frac{w_B}{w_A+w_B}\right)^2}} $$

A value of $1$ for $Z_W$ would indicate that it's greater than ~84% of observations of $Z_W$.

Please let me know if you have any follow-up questions.

Related Question