Solved – Some of the data is not normally distributed, what test should i use

anovadescriptive statisticsmathematical-statisticsnormality-assumptionstatistical significance

When testing for normality and homogeneity of variance in SPSS, it showed this:

If I go by Kolmogorov-Smirnov, than the 'M' data is not normal, but if I go by Shapiro-Wilk, they all are normally distributed.
However, the test of homogeneity of variance shows that based on the Mean and based on the trimmed mean, equal variance is not assumed. I know this would change which Post-hoc test I use for BG Anova but I'm not sure how it would affect my data if I use a Kruskal-Wallis Anova?

Should I use a Kruskal-Wallis ANOVA or a 1-way Between Groups ANOVA?

Thank you!

Best Answer

With such relatively small samples, I would not expect definitive results from either the Shapiro-Wilk or the Kolmogorov-Smirnov tests. Usually, the latter has poorer power than the former so I wonder why K-S (alone) finds group M data non-normal. Even though all six of the P-values for normality tests are about the same, I would want to see whether there are far outliers in any of the three groups; if not, I would not worry much about nonnormality.

I think your main problem may be heteroscedasticity, and I would use an ANOVA procedure designed to take possibly-unequal group variances into account. You may be familiar with the Welch two-sample t test, which does not assume equal variances of the two groups. In its procedure 'oneway.test', R implements a one-way ANOVA that does not assume equal variances. (Adjustments for unequal variances are similar to those of the Welch t test.) I would use this test in preference to a Kruskal-Wallis test because that test explicitly requires populations to be of the 'same shape', which implies 'equal variances'.

I do not know whether SPSS has implemented a one-way ANOVA procedure that does not require homoscedasticity.

The following normal data are simulated (in R) to have relatively modest differences among group means and markedly different variances among group variances.

set.seed(2020)  # for reproducibility
a = rnorm(20, 100, 10)
b = rnorm(20, 105, 5)
c = rnorm(20, 112, 15)
x = c(a,b,c)
g = as.factor(rep(1:3, each=20))

boxplot(x ~ g, col="skyblue2")

The "Welchified" one-way ANOVA test finds significant differences among groups at about the 2% level of significance. (In a standard one-way ANOVA the denominator df would be 57; here ddf are about 31, adjusting for heteroscedasticity.)

oneway.test(x ~ g)

        One-way analysis of means (not assuming equal variances)

data:  x and g
F = 4.5939, num df = 2.000, denom df = 31.383, p-value = 0.01779

Ad hoc Welch two-sample t test show groups A and B to differ at the 2% level (so, of course, A and C differ also). There is no significant difference between B and C. According to the Bonferroni method of protecting against false discovery, it is reasonable to conclude that A differs from B and C.

Perhaps your data are sufficiently similar to my simulated data that your data can be profitably analyzed using the methods I show above.

Related Solutions

Solved – choose to use only Shapiro-Wilk

If you want a formal hypothesis test of some hypothesis, you should use one test to test that hypothesis.

You should choose that test before you see data, not after you have results in front of you.

You should normally choose that test so that it gives the best power against the alternatives that matter to you. If you're looking at data (let alone p-values) it's already too late to do this cleanly. (This is an argument against those packages that just present a laundry list of tests as a matter of course -- they directly encourage p-hacking -- consciously or unconsciously there will be a tendency to focus on the result you were looking for. Better, I think is the packages design philosophy that gives you tests that you ask for, so that you at least make a conscious decision about what you're going to test and when.
It's easy (before the fact) to justify using the Shapiro-Wilk -- it's generally more powerful than most of the competitors, including what SPSS is calling the Kolmogorov-Smirnov, but which I assume is actually Lilliefors' test (because the actual Kolmogorov-Smirnov test is not a test of general normality -- it's not clear why they'd choose to erase Lilliefors' contribution).
If you're actually trying to check suitability of assumptions of some other procedure, formal hypothesis testing is generally unsuitable.

Firstly, see Is normality testing essentially useless -- especially the answer by Harvey.

Secondly, if you're choosing between different procedures (such as one that assumes normality if you fail to reject and doesn't assume normality if you do) on the basis of a test of normality, you impact the properties (significance level and power) of both the alternatives you're choosing between and the result is not necessarily what you might hope for. Typically if you're not comfortable justifying a choice of a normal-theory procedure before you see the data you should probably just use a test that either doesn't depend on that assumption at all or at least something that's pretty robust to it (and it's not just level-robustness that matters, though you'd hardly guess from many discussions of robustness of tests).
The phrasing in "the number from SW is the ones that showed significant level, while KS is otherwise" is unclear. If you actually mean that the Shapiro-Wilk would reject the null while the other test would not (or vice versa), using that significance or non-significance as a reason to choose the test is unambiguously p-hacking. If you're choosing between tests post hoc on the basis of whether they rejected or didn't reject, you have to toss out the p-values you're looking at because they no longer mean much of anything; if you present the results as if you had just run one test, you're misleading the people who read your work.
I note from your previous question that $n=5$. That's not much to go on, power may be pretty low against some kinds of alternatives that could matter; with such a small sample like that neither a rejection nor a non-rejection is particularly informative (if we entertain seriously the possibility that the null can be true (the population could actually be normal), the power of the Kolmogorov-Smirnov may be so low that a rejection may be fairly likely to just represent type I error).

If there's no good reason to anticipate normality, unless you have a procedure that's quite robust to the assumption, I'd be inclined to avoid assuming it.

Kolmogorov-Smirnov Test – How to Use Kolmogorov-Smirnov Test for Assessing Normality of a Random Variable

Your approach is Procrustean: when you standardized the data, you forced them to look a little more like standard Normal values than they had. After all, part of detecting a difference in distribution involves comparing their means and variances, which you have forced to be the same.

As a result, you are fooling the KS test. It turns out the p-values it returns are dramatically too large, as these results of 10,000 simulated datasets (of size $50$) attest. They summarize two p-values: one obtained by applying the KS test to an iid standard Normal sample and another obtained in exactly the same way, after standardizing that sample.

The red lines plot the ideal null (uniform) distribution for reference.

One thought would be to correct the standardized p-value somehow. But sometimes the p-values are nearly the same because the original sample happened to be nearly standardized, anyway. On rare occasions the standardization makes the data look less like they were drawn from a standard Normal distribution: the KS test evaluates many other aspects of the distribution than its first two moments. But most often, standardization pulls the p-value up (making it harder to detect a departure from being standard Normal). Consequently, we cannot even predict the correct p-value from the incorrect one with acceptable accuracy. Here is the scatterplot of the pairs of p-values in the simulation.

These considerations are sufficiently general--they appeal to no particular property of the KS test apart from its purpose--and thereby suggest similar problems would attend the use of standardization with almost any distributional test.

Such simulations take little time (this requires less than a second to complete) and can be coded in minutes, so they often are worth doing when subtle questions of this kind arise. As an example of how little effort might be needed, here's R code to reproduce this simulation.

n.sim <- 1e4
n <- 50
set.seed(17)
X <- matrix(rnorm(n*n.sim), n)

f <- function(x) ks.test(x, "pnorm")$p.value
ks.1 <- apply(X, 2, f)
ks.2 <- apply(scale(X), 2, f)

The rest of it is a matter of post-processing the arrays of p-values in ks.1 and ks.2. For the record, here's how I did that to make the figures.

# Figure 1: Histograms
par(mfrow=c(1,2))
b <- seq(0, 1, by=0.05)
hist(ks.1, breaks=b, freq=FALSE, col=gray(.9), main="Non-standardized", xlab="p-value")
abline(h=1, lwd=2, col=hsv(0,1,3/4))
hist(ks.2, breaks=b, freq=FALSE, col=gray(.9), main="Standardized", xlab="p-value")
abline(h=1, lwd=2, col=hsv(0,1,3/4))
par(mfrow=c(1,1))

# Figure 2: Scatterplot
plot(ks.1, ks.2, pch=21, bg=gray(0, alpha=.05), col=gray(0, alpha=.2), cex=.5,
     xlab="Non-standardized p-value", ylab="Standardized p-value", asp=1)

Best Answer

Related Solutions

Solved – choose to use only Shapiro-Wilk

Kolmogorov-Smirnov Test – How to Use Kolmogorov-Smirnov Test for Assessing Normality of a Random Variable

Related Question