Solved – Tests of normality – qq and Shapiro-Wilk

lognormal distributionnonparametricnormal distributionnormality-assumptionnormalization

I am new to the world of stats …

My data had a log normal distribution, so transformed by log to get it nearer normal distribution. This is real-world data.

From here I want to establish if my data is normal for parametric tests (ANOVA tests for differences in groups and then Tukey HSD to find out which groups are different).

So I ran a few tests in R:

    Median =  1.249979
    Mean =  1.278969
    Skewness =  0.3918898
    Kurtosis = -0.1024776
    
    Shapiro-Wilk normality test
    data:  mergedbedanova$logwinterCV
    W = 0.98709, p-value = 0.01769

The Shapiro-Wilk test suggests that my data is not normal.

Histogram of data

enter image description here

enter image description here

Question

Is this data normal or 'normal enough' for parametric testing? Or do i need to look at non-parametric tests?

Best Answer

You should test the residuals in a one-way ANOVA to see if they are normal. Especially if the levels of the factor are significantly different, there is no reason to expect the aggregate data to be normal.

As an example, suppose the factor has three levels. Then the data for the three levels separately might be as generated below in R, so that we know the conditions for a one-way ANOVA are precisely met:

    set.seed(122)
    x1 = rnorm(50, 100, 12)
    x2 = rnorm(50, 105, 12)
    x3 = rnorm(50, 135, 12)
    x = c(x1, x2, x3)
    shapiro.test(x)

            Shapiro-Wilk normality test

    data:  x
    W = 0.98219, p-value = 0.04922

However, the Shapiro-Wilk test suggests that the aggregate data are not normal. Also, the kernel density estimator plotted through their histogram below shows right skewness somewhat similar to the data you show.

enter image description here

The residuals for this model are $X_{ij} - A_i,$ for $i = 1,2,3; j = 1, \dots, 50;$ where $A_i = \sum_j X_{ij}.$

Specifically, for my fake data, the Shapiro-Wilk test shows that the 150 residuals are consistent with normality:

    r = c(x1-mean(x1), x2-mean(x2), x3-mean(x3))  
        shapiro.test(r)

            Shapiro-Wilk normality test

    data:  r
    W = 0.99134, p-value = 0.4933
Related Question