Solved – How to check if the data-frame is normally distributed in R

normal distributionnormalizationqq-plot

I have a data frame with 7 columns that holds numerical and integer values where some columns, even though numerical, are binary values (e.g. a dummy variable for sex; $0=\text{male}$, $1=\text{female}$).

I was asked to check if my data frame is normally distributed and if not I have to normalize it. I found that there’s two ways to check: either by visualization, or by testing. However I tried both I didn’t get the outcome I want!

Best Answer

Welcome to CV!

There are several issues with your suggested approach:

  • Contrary to what the name suggests, normalization will not turn an arbitrarily distributed variable into a normally distributed one.
  • Neither can normality testing tell you that your data are normally distributed (only whether there is a significant deviation from normality).
  • Finally, data need rarely be normally distributed. It is also unlikely any of your data truly are normally distributed in the first place. You mentioned an integer variable, this can't be exactly normal, because the normal distribution is continuous, from $-\infty$ to $+\infty$. The same goes for the binary variable. Rather, it is common for models to assume the conditional distribution of the outcome variable to be approximately normally distributed.

As to what approach is best, you may want to have a look here for starters.