Solved – Transformation for negative skewness data

data transformationnormality-assumptionresidualsskewness

My analysis involved some behavioral data on swine. One measure we had was standing time (min) for pigs using accelerometers. Using SAS, I checked for normality, and results showed data to be non-normal (Shapiro–Wilk < 0.05). I then performed residual analysis, which again showed non-normal data. Skewness was -0.42. The reason for the negative skewness was probably because there was a set upper limit (60 min) for the variable measured. So I reflected the data and did a reflected SQRT. I then fit the transformed data through the model and re-checked the residuals. Results were still non-normal. Any suggestion on what I can do?

Best Answer

Trying to get to normality is usually a means to an end. Understanding the end will really help in understanding what is the best path forward, and oftentimes whether or not transforming to get to normality is even needed. There are tons of natural distributions that are not normal and do not need to be transformed to normal.

The set upper limit of 60 is problematic because it creates a false upper end of the distribution. You may be getting a bad Shapiro-Wilk simply because there are too many 60's in your dataset. I would recommend bootstrapping a right-tail or concatenating the upper and lower ends of your dataset. These are not ideal methods and must all be put in context of what you're trying to do, so as to see if they make sense.