I want to show by simulation that the Wilcoxon test is more robust than the Student test for non-normally distributed data

nonparametricparametricrsimulation

I want to test by simulation that the Wilcoxon test is more robust than the Student test for non-normally distributed data.

For example, I'm testing the uniform distribution and the exponential distribution. I don't know if I simulated it wrong or if I missed something. But I can't find the robustness of the Wilcoxon test compared with the Student test?

I have simulated so that the average of population A is higher than that of population B with one point.

##UNIFORM------
populationA<-round(runif(100000, 73,75),1)
populationB<-round(runif(100000, 72,74),1)
t.test(populationA,populationB)

plot(density(populationA))
plot(density(populationB))

n<-5
n<-10

sub_popA<-sample(populationA,size = n)
sub_popB<-sample(populationB,size = n)

t.test(sub_popA,sub_popB)$p.value
wilcox.test(sub_popA,sub_popB)$p.value

With different sample sizes, I've found that the Student test is closer to the truth than the Wilcoxon test.

It is the same for the exponential distribution, I didn't find a superiority of the Wilcoxon test compared to Student test. Even after reproducing the sampling 100 time and counting the number

##
populationA<-rexp(100000,1)
populationB<-rexp(100000,1/2)
t.test(populationA,populationB)

sub_popA<-sample(populationA,size = 10)
sub_popB<-sample(populationB,size = 10)
t.test(sub_popA,sub_popB)
wilcox.test(sub_popA,sub_popB)

Best Answer

With respect to your choices of distribution, the uniform distribution is not something you need to worry about being robust against. In the old days, a common way of simulating a Normal distribution was to average 12 uniform variates, which implies that a t-statistic based on two samples of size six would be close to the desired distribution under the null hypothesis. In fact, the asymptotic relative efficiency of the Wilcoxon to the t-test for the uniform distribution is... $1.0$. (For the Normal, it's $0.955$.)

Generally speaking (but not always), you want to protect against outliers relative to the "base" distribution (in this case, the Normal), which can be thought of as generated by a distribution with "fatter" tails than the Normal.

Let's do a more extensive simulation with 10,000 repeats of the procedure (100 is far too small a sample size to draw any conclusions except in the most egregious cases) with our base distribution being the fat-tailed $t(3)$ distribution:

library(data.table)
reject <- data.table(t = rep(0,10000), w = rep(0, 10000))

for (i in 1:nrow(reject)) {
    x1 <- rt(10, 3)
    x2 <- rt(10, 3) - 2
    
    reject$t[i] <- t.test(x1,x2)$p.value
    reject$w[i] <- wilcox.test(x1,x2)$p.value
}

reject[, .(t_reject = mean(t < 0.01), Wilcox_reject = mean(w < 0.01))]

which gives us the following:

> reject[, .(t_reject = mean(t < 0.01), Wilcox_reject = mean(w < 0.01))]
   t_reject Wilcox_reject
1:    0.551         0.625
> reject[, .(t_reject = mean(t < 0.05), Wilcox_reject = mean(w < 0.05))]
   t_reject Wilcox_reject
1:    0.766        0.8414

Clearly favoring the Wilcoxon test.

Now for your Exponential distribution test. Your two Exponential distributions differ in scale, not location; this makes it harder to detect changes in the mean. However, with a larger number of repeats of the experiment, we can still see a difference:

for (i in 1:nrow(reject)) {
    x1 <- rexp(10)
    x2 <- rexp(10)/3
    
    reject$t[i] <- t.test(x1,x2)$p.value
    reject$w[i] <- wilcox.test(x1,x2)$p.value
}

> reject[, .(t_reject = mean(t < 0.01), Wilcox_reject = mean(w < 0.01))]
   t_reject Wilcox_reject
1:   0.1176        0.2313
> reject[, .(t_reject = mean(t < 0.05), Wilcox_reject = mean(w < 0.05))]
   t_reject Wilcox_reject
1:   0.4536        0.4697

If, instead of rescaling the Exponential distributions, we add a location parameter and rerun the tests, testing for differences in location, we get the following:

for (i in 1:nrow(reject)) {
    x1 <- rexp(10)
    x2 <- rexp(10) + 0.5
    
    reject$t[i] <- t.test(x1,x2)$p.value
    reject$w[i] <- wilcox.test(x1,x2)$p.value
}

> reject[, .(t_reject = mean(t < 0.01), Wilcox_reject = mean(w < 0.01))]
   t_reject Wilcox_reject
1:   0.0708        0.1283
> reject[, .(t_reject = mean(t < 0.05), Wilcox_reject = mean(w < 0.05))]
   t_reject Wilcox_reject
1:    0.222         0.313

Note also that the t-test will fail when the underlying distributions do not have a finite variance, but the Wilcoxon will not.

Related Solutions

Solved – Two-sample one-sided Kolmogorov-Smirnov test vs one-sided Wilcoxon-Mann-Whitney test

Both are testing for displacement of the x variable with respect to the y variable, but the 2 tests have opposite meanings for the term "greater" (and therefor also or "less").

In the ks.test "greater" means that the CDF of 'x' is higher than the CDF of 'y' which means that things like the mean and the median will be smaller values in 'x' than in 'y' if the CDF of 'x' is "greater" than the CDF of 'y'. In 'wicox.test' and 't.test' the mean, median, etc. will be greater in 'x' than in 'y' if you believe that the alternative of "greater" is true.

An example from R:

> x <- rnorm(25)
> y <- rnorm(25, 1)
> 
> ks.test(x,y, alt='greater')

        Two-sample Kolmogorov-Smirnov test

data:  x and y 
D = 0.6, p-value = 0.0001625
alternative hypothesis: two-sided 

> wilcox.test( x, y, alt='greater' )

        Wilcoxon rank sum test

data:  x and y 
W = 127, p-value = 0.9999
alternative hypothesis: true location shift is greater than 0 

> wilcox.test( x, y, alt='less' )

        Wilcoxon rank sum test

data:  x and y 
W = 127, p-value = 0.000101
alternative hypothesis: true location shift is less than 0

Here I generated 2 samples from a normal distribution, both with sample size 25 and standard deviation of 1. The x variable comes from a distribution of mean 0 and the y variable from a distribution of mean 1. You can see the results of ks.test give a very significant result testing in the "greater" direction even though x has the smaller mean, this is because the CDF of x is above that of y. The wilcox.test function shows lack of significance in the "greater" direction, but similar level of significance in the "less" direction.

Both tests are different approaches to testing the same idea, but what "greater" and "less" mean to the 2 tests are different (and conceptually opposite).

Solved – perform a signed-rank test on a weighted sample in R

"replicating rows with respect to the values of the weighing factors, but it seems uneasy as the latter are not integers;"

This might be worth trying a little more, but maybe I'm missing something. If you have 100 weighted observations, you could try randomly sampling one observation at a time (with replacement? hmm... not sure what the implications are off the top of my head), and then draw a uniform random number x between 0 and 1. If weights are between 0 and 1, then add the observation to the final set if x < this_observations_weight. That way, an observation with .3 weight should have 3X the probability of being selected as one with a .1 weight.

Keep doing that until you get 100 final observations (i.e. you may have to draw n rows and n uniforms, n > 100, to get your full set). You could then plot a distribution of the Wilcoxon stat or take a mean or something like that... Hope I didn't misunderstand your attempt!

Best Answer

Related Solutions

Solved – Two-sample one-sided Kolmogorov-Smirnov test vs one-sided Wilcoxon-Mann-Whitney test

Solved – perform a signed-rank test on a weighted sample in R

Related Question