I want to test by simulation that the Wilcoxon test is more robust than the Student test for non-normally distributed data.
For example, I'm testing the uniform distribution
and the exponential distribution
. I don't know if I simulated it wrong or if I missed something. But I can't find the robustness of the Wilcoxon test compared with the Student test?
I have simulated so that the average of population A is higher than that of population B with one point.
##UNIFORM------
populationA<-round(runif(100000, 73,75),1)
populationB<-round(runif(100000, 72,74),1)
t.test(populationA,populationB)
plot(density(populationA))
plot(density(populationB))
n<-5
n<-10
sub_popA<-sample(populationA,size = n)
sub_popB<-sample(populationB,size = n)
t.test(sub_popA,sub_popB)$p.value
wilcox.test(sub_popA,sub_popB)$p.value
With different sample sizes, I've found that the Student test is closer to the truth than the Wilcoxon test.
It is the same for the exponential distribution, I didn't find a superiority of the Wilcoxon test compared to Student test. Even after reproducing the sampling 100 time and counting the number
##
populationA<-rexp(100000,1)
populationB<-rexp(100000,1/2)
t.test(populationA,populationB)
sub_popA<-sample(populationA,size = 10)
sub_popB<-sample(populationB,size = 10)
t.test(sub_popA,sub_popB)
wilcox.test(sub_popA,sub_popB)
Best Answer
With respect to your choices of distribution, the uniform distribution is not something you need to worry about being robust against. In the old days, a common way of simulating a Normal distribution was to average 12 uniform variates, which implies that a t-statistic based on two samples of size six would be close to the desired distribution under the null hypothesis. In fact, the asymptotic relative efficiency of the Wilcoxon to the t-test for the uniform distribution is... $1.0$. (For the Normal, it's $0.955$.)
Generally speaking (but not always), you want to protect against outliers relative to the "base" distribution (in this case, the Normal), which can be thought of as generated by a distribution with "fatter" tails than the Normal.
Let's do a more extensive simulation with 10,000 repeats of the procedure (100 is far too small a sample size to draw any conclusions except in the most egregious cases) with our base distribution being the fat-tailed $t(3)$ distribution:
which gives us the following:
Clearly favoring the Wilcoxon test.
Now for your Exponential distribution test. Your two Exponential distributions differ in scale, not location; this makes it harder to detect changes in the mean. However, with a larger number of repeats of the experiment, we can still see a difference:
If, instead of rescaling the Exponential distributions, we add a location parameter and rerun the tests, testing for differences in location, we get the following:
Note also that the t-test will fail when the underlying distributions do not have a finite variance, but the Wilcoxon will not.