I'm running a pre-post hypothesis test on a small dataset, due to it's size n = 12
I am running the test for the exact distribution.
df =
ID Period Varieble Value
1 0 Month Something 18
1 3 Month Something 26
2 0 Month Something 23
2 3 Month Something 4
3 0 Month Something 24
3 3 Month Something 3
4 0 Month Something 27
4 3 Month Something 26
5 0 Month Something 9
5 3 Month Something 0
6 0 Month Something 40
6 3 Month Something 3
7 0 Month Something 17
7 3 Month Something 10
8 0 Month Something 33
8 3 Month Something 9
9 0 Month Something 6
9 3 Month Something 7
10 0 Month Something 8
10 3 Month Something 1
11 0 Month Something 9
11 3 Month Something 4
13 0 Month Something 26
13 3 Month Something 9
This works fine in python:
dfb = df.query('Period == "0 Month"')
dfa = df.query('Period == "3 Month"')
wilcoxon(dfb.Value, dfa.Value, alternative='two-sided', correction=True,
mode ='exact',)
Result = WilcoxonResult(statistic=7.5, pvalue=0.01220703125)
When I try to do the same in R I get a different result as it defaults to the Normal Approximation and provides the warning:
Warning message:
"In wilcox.test.default(dfb$Value, dfa$Value, paired = TRUE,
exact = TRUE, :
cannot compute exact p-value with ties"
R Code:
dfb <- filter(df, Period == "0 Month")
dfa <- filter(df, Period == "3 Month")
wilcox.test(dfb$Value,dfa$Value,paired=TRUE, exact = FALSE,
alternative = 'two.sided')
Result:V = 70.5, p-value = 0.01494
Is there a statistically sound reason why R is defaulting to the normal approximation due to the existing ties? I can't find this in any of the literature.
Conversely, is the method that the scipy library is using valid?
Best Answer
You should note the following paragraph in the scipy manual:
(bolding is my own). The question remains why the results differ.