R Statistics – Using wilcox.test() and t.test() in R to Yield Different P-values

This is a repost from the R forum, as I was told to post here instead.

I would like to test whether there's a significant difference in the mean between this two samples:

withincollaraccuracyknn<-c(0.960, 0.993,0.975,0.967,0.968,0.948)
withincollaraccuracytree<-c(0.953,0.947,0.897,0.943,0.933,0.879)

The data is normally distributed as you can see after running a Shapiro-Wilk test:

> sh<-c(0.960,0.993,0.975,0.967,0.968,0.948,0.953,0.947,0.897,0.943,0.933,0.879)
> shapiro.test(sh)

    Shapiro-Wilk normality test

data:  sh
W = 0.91711, p-value = 0.2628

However, using t.test() or wilcox.test() yield different p-values:

> t.test(withincollaraccuracyknn,withincollaraccuracytree)

    Welch Two Sample t-test

data:  withincollaraccuracyknn and withincollaraccuracytree
t = 3.1336, df = 7.3505, p-value = 0.01552
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.01090532 0.07542802
sample estimates:
mean of x mean of y 
0.9685000 0.9253333 

> wilcox.test(withincollaraccuracyknn,withincollaraccuracytree)

    Wilcoxon rank sum test

data:  withincollaraccuracyknn and withincollaraccuracytree
W = 35, p-value = 0.004329
alternative hypothesis: true location shift is not equal to 0

Could somebody please let me know why? On the Wikipedia page of Mann-Whitney U test, it is stated: "It is nearly as efficient as the t-test on normal distributions".

Note also a Warning when the data is not normally distributed:

> withincollarprecisionknn<-c(0.985,0.995,0.962,1,0.982,0.990)
> withincollarprecisiontree<-c(1,0.889,0.96,0.953,0.926,0.833)
> 
> sh<-c(0.985,0.995,0.962,1,0.982,0.990,1,0.889,0.96,0.953,0.926,0.833)
> 
> shapiro.test(sh)

    Shapiro-Wilk normality test

data:  sh
W = 0.82062, p-value = 0.01623

> 
> 
> wilcox.test(withincollarprecisionknn,withincollarprecisiontree)

    Wilcoxon rank sum test with continuity correction

data:  withincollarprecisionknn and withincollarprecisiontree
W = 30.5, p-value = 0.05424
alternative hypothesis: true location shift is not equal to 0

Warning message:
In wilcox.test.default(withincollarprecisionknn, withincollarprecisiontree) :
  cannot compute exact p-value with ties

Any help is appreciated. Note that I need to run similar analyses for other datasets having not normally distributed data, so using wilcox.test() instead of t.test() would be an advantage!

Best Answer

Steady on there!

You have two very small samples there. Statistics is not taught at Hogwarts! No white magic for very small samples.
Not rejecting the null on Shapiro-Wilk doesn't allow the description "is normally distributed", but rather a much more circumspect "not enough evidence to be clear that this isn't normally distributed".
Let's look at graphs, for data separate (left) and data pooled (right).

The graphs would be straight if data were from a normal. I see two things there: Not too bad in terms of (non-)normality for very small samples, but not the same slope, meaning different variability. Checking that, I find the SD for tree is 0.030, and that for knn 0.015: a two-fold difference. The t test should be allowed to follow suit but what you called copes with unequal variability.

Most crucially, no one (competent) promises exactly the same P-values. Different tests focus on different information. For this kind of problem and data, they shouldn't be wildly contradictory, no more, no less.

PS: My own view is that the graph is more interesting and more convincing that any formal test, but those who review your work might want to hear the clank of testing machinery and see the wheels turning.

Best Answer

Related Solutions

Solved – Testing for significance between means, having one normal distributed sample and one non normal distributed

Solved – Paired t-test for binary data

Related Question