Solved – z-score and Normal Distribution

normal distributionstandard deviationstandardizationz-score

It's my understanding that a z-score can only be calculated and accurately used for data sets that are normally distributed.

Since "perfect" normal distribution almost never occurs in real-world data (where "perfect" normal distribution is defined as 1. The mean, median, and mode all equal the same number, 2. the distribution is perfectly symmetrical between all standard deviations on both sides of the mean, and 3. the distribution is asymptotic), how "close" can the distribution be to perfectly normal for the z-score to still be a valid statistical measure?

If the answer is it has to be a perfectly normal distribution, then my question is why is the z-score so important since it would seldom be a valid measure on real-world data?

Best Answer

Technically, z-scoring does not depend on any distributional assumptions, such as normality. It's just a way of describing how far observations are from the mean, no matter what the distribution happens to be. So no harm in z-scoring non-normal variables.

The main caveat is that z-scores tend to be more informative for distributions that are at least approximately symmetric about the mean (which includes normal distributions, but also many others), and less informative for highly skewed distributions. The reason is because in skewed distributions, two observations that lie on opposite side of the mean can have the same absolute z-score despite one being much more/less probable than the other.