Trying to choose between these two tests for data I've harvested from Android store. Basically, I want to see if there is any difference in the number of dangerous permissions requested by free vs paid apps. I have equal sample sizes of 1900. When I plot the data they are both highly skewed, almost like decay curves. Under student-t I understand there is an assumption of normal distribution, but not sure what has to be normally distributed, so not sure whether student t would be the right test or whether to use non-parametric mann-whitney?
Solved – Student’s t vs Mann-Whitney U
nonparametrict-testwilcoxon-mann-whitney-test
Related Solutions
Consider the Gaussian (normal) cumulative distribution function $\Phi(y)$ and the empirical cumulative distribution function $F_{n}(y)$. For optimality (control of type I error and low type II error) the two-sample $t$-test assumes that $\Phi^{-1}(F_{n}(y))$ when stratified by group yields two straight parallel lines. For optimal power the Wilcoxon test (and the proportional odds ordinal logistic model) assume that $\textrm{logit}(F_{n}(y))$ yields two parallel curves (they needn't be straight lines). The Wilcoxon assumptions are less stringent than the parametric method's assumptions.
So for optimality the Wilcoxon test assumes that the two distributions, after logit transformation, have the same shape and steepness. But the Wilcoxon test can be successfully used even when this assumption doesn't hold.
Much confusion exists about what the test assumes when you are calculating $P$-values. If using the normal distribution shortcut for $P$-values, more assumptions are made including equal dispersion in the two groups. If using general $U$-statistic theory (e.g. R
Hmisc
package rcorr.cens
function) to get the standard error of the concordance probability), or if using the likelihood ratio or score $\chi^2$ tests from the proportional odds model, this assumption is not needed.
You are indeed correct that they are testing for the equality of the mean.
But you might be more interested in what makes them different from one another and what criteria should be meant to select which test. The tests can be ranked in generality. (Most general to least general)
The Mann Whitney test is the most general with the fewest assumptions, it is a nonparametric test, meaning that we do not need any distributional assumptions about the probability distribution that the data comes from.
The Welch test adds the assumption that the two groups comes from a normal distribution, but the variance in the two groups can be different.
The Student t-test adds yet another assumption, that the variance should be equal in the two groups.
But what do these assumptions imply? What effect does this have on our analysis?
The following is a small example from a built in dataset in R called mtcars. It has data about different types of cars and variables measured from these cars. If you use R you can use the summary command to get info about the data set
summary(mtcars)
Now we have variable called mpg, which is the mileage and am, which is 0 if the car is manual and 1 if the car is automatic. We want to test if there is a significant difference between the mileage of cars depending on whether they are automatic or manual.
We can plot the data to get some intuition about the answer:
plot(mtcars$am,mtcars$mpg)
title('mpg vs am')
We can see that the mileage of the automatic cars seems to be higher, but we would like to do a statistical test to conform this.
Starting with the Student t-test we get:
> t.test(mpg~am,data=mtcars,var.equal=TRUE)
Two Sample t-test
data: mpg by am
t = -4.1061, df = 30, p-value = 0.000285
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-10.84837 -3.64151
sample estimates:
mean in group 0 mean in group 1
17.14737 24.39231
And the Welch test:
> t.test(mpg~am,data=mtcars)
Welch Two Sample t-test
data: mpg by am
t = -3.7671, df = 18.332, p-value = 0.001374
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-11.280194 -3.209684
sample estimates:
mean in group 0 mean in group 1
17.14737 24.39231
And finally the Mann Whitney test:
> wilcox.test(mpg~am,data=mtcars)
Wilcoxon rank sum test with continuity correction
data: mpg by am
W = 42, p-value = 0.001871
alternative hypothesis: true location shift is not equal to 0
The $p$ values are increasing for these tests, i.e.
t-test has $p$-value = 0.000285
Welch test has $p$-value = 0.001374
Mann Whitney test has $p$-value = 0.001871
So when you relax on your assumptions you become less certain of the outcome in some sense. If your $\alpha$ value of the test would be 0.1\% (Which is rarely used except when doing multiple testing/comparisons or maybe in pharmaceutical experiments), then you would reject the null hypothesis using the t-test but not with the other tests.
If you know that the data comes from a normal distribution and the two groups have equal variance, this is the correct thing to do. People often tend to assume that the data comes from a normal distribution, but if you want to be conservative the Mann Whitney test is more appropriate.
Best Answer
Skewness will give you trouble with the t-test, yes. You could perhaps do a Mann-whitney, but since the data are counts, you probably need a test that fits with count data.
I'd be inclined to suggest assuming something like Poisson and then conditioning on the sum (giving a binomial test) ... but since you have a mix of applications, there may be additional skewness induced by that heterogeneity.
How skew are the distributions?
How were the applications selected?
You may ultimately be best off treating the applications as a random effect.