[Math] Does standardizing a random variable that is not normally distributed change the underlying distribution

normal distributionstandard deviation

For my analysis I am standardizing Response Times, which are usually known to be skewed and are in my data set, using the "classic" standardization method of substracting the grand mean and dividing by the standard deviation.

Although I was wondering if standardizing for variable you know are not normally distributed is not actually operating some kind of filtering on the distribution.

I compared the pre and post histogram of the variable, and even if the post histogram is still skewed, I am not sure if it not in fact smoother.

Pre scaling histogram

Post scaling histogram

I guess my question is : is it relevant to standardize non normally distributed variables using the classic method ?
If not, what are the best way to scale appropriately the distribution ?

Thanks!

Best Answer

Your histograms look different because your stats package is binning the data. Normalization of a data set by subtracting its mean and dividing by its standard deviation doesn't really change the shape of the data set; it's only scaling the data and shifting the center to achieve a mean of zero and a sd of 1. If you want to confirm this, do a normal QQ plot of the pre data and a normal QQ plot of the post data: you'll see the plots look pretty much the same. The R command

qqnorm(x)

will produce a normal quantile-quantile plot of your data set x, which compares the shape of your data to a normal distribution.

If you really want to change the shape of your data, you need to apply a nonlinear transformation, such as the log transformation (which will pull in a long right tail), or, less drastically, a square root function. (There are many other possibilities.)

Related Question