I'm trying to measure how "exceptional" a particular observation is based on several attributes of that observation among a population of observations.
Each observation has several attributes, all numerical quantities but on different scales. To normalize these attributes to be on the same scale, I calculate the Z score of each attribute for each observation.
Then, I combine the attributes together of one observation using a weighted average.
For example, if I believe attribute 1 is twice as important as attribute 2, so attribute 1 will have a weight of two and attribute 2 will have a weight of one. I do the same for all observations.
I call the result of the weighted average an "observation Z Score." I interpret an observation with a Z Score of 1 as being more "exceptional" than 84% of other observations.
Sanity check questions
- Is this method ok?
- Is the method to weight an attribute twice as important with a weight of 2 appropriate for Z Scores?
- Is my interpretation of observation Z Score appropriate?
Update:
Since there is a problem with Z Scores, what if I used the data's percentile (the % of observations that have attributes lower than it) instead of it's Z score? This way, there are no normality assumptions and the obligation to keep the variance the same.
To elaborate:
Each observation has several attributes, all numerical quantities but on different scales. To normalize these attributes to be on the same scale, I calculate the percentile of each attribute for each observation.
Then, I combine the attributes together of one observation using a weighted average. All weights sum to 1.
For example, if I believe attribute 1 is twice as important as attribute 2, so attribute 1 will have a weight of two and attribute 2 will have a weight of one. I do the same for all observations.
I call the result of the weighted average an "observation percentile." I interpret an observation with a percentile of 0.60 as being more "exceptional" than 60% of other observations.
Sanity check questions
- Is this method ok?
- Is the method to weight an attribute twice as important with a weight of 2 appropriate for percentiles?
- Is my interpretation of observation percentile appropriate?
Best Answer
The concept you're calling 'exceptionality' is simply a combined variable (via a weighted average) from two or more variables standardized to a Z-score. If there were a way of observing 'exceptionality' as sampled data, you could potentially fit a (standardized) multiple regression with your variables to find the best weights to use.
Let's consider two random variables $A$ and $B$, which are standardized to $Z_A$ and $Z_B$ respectively (meaning each follows a standard normal distribution, i.e. mean of 0 and variance of 1).
The weighted average of $Z_A$ and $Z_B$, where $w_A$ and $w_B$ are the respective weights for $Z_A$ and $Z_B$, is then: $$ W = \frac{w_A}{w_A+w_B} \cdot Z_A + \frac{w_B}{w_A+w_B} \cdot Z_B $$
Note that $w_A$ and $w_B$ are constants, whereas $Z_A$ and $Z_B$ are random variables.
Therefore, the expected value of $W$ is as follows: $$ \text{E}(W) = \frac{w_A}{w_A+w_B} \cdot \text{E}(Z_A) + \frac{w_B}{w_A+w_B} \cdot \text{E}(Z_B) = 0 $$
The variance of $W$, assuming the independence of $Z_A$ and $Z_B$, is: $$ \text{Var}(W) = \left(\frac{w_A}{w_A+w_B}\right)^2 \cdot \text{Var}(Z_A) + \left(\frac{w_B}{w_A+w_B}\right)^2 \cdot \text{Var}(Z_B) \\ = \left(\frac{w_A}{w_A+w_B}\right)^2 + \left(\frac{w_B}{w_A+w_B}\right)^2 $$
The variance of $W$, depending on the discrepancy between weights $w_A$ and $w_B$, must fall inside the interval $[.5,1)$. Although the mean is 0, because the variance is not 1, $W$ does not follow a standard normal distribution and therefore cannot be treated as a $Z$-score.
To make inferences like "a value of $W$ (the weight-averaged Z-scores) $= 1$ is greater than ~84% of observations" would involve having to standardize by dividing $W$ by its standard deviation. Therefore, the Z-score of $W$ becomes: $$ Z_W = \frac{\frac{w_A}{w_A+w_B} \cdot Z_A + \frac{w_B}{w_A+w_B} \cdot Z_B}{\sqrt{\left(\frac{w_A}{w_A+w_B}\right)^2 + \left(\frac{w_B}{w_A+w_B}\right)^2}} $$
A value of $1$ for $Z_W$ would indicate that it's greater than ~84% of observations of $Z_W$.
Please let me know if you have any follow-up questions.