I want to compare differences between two independent groups (female and male) when dependent variables are continuous. However, my sample size is very small (N=6). I have done a Mann-Whitney U Test and I am not sure if the results are meaningful given the small sample.
Solved – use a Mann-Whitney U Test with a very small sample
hypothesis testingnonparametricsample-sizewilcoxon-mann-whitney-test
Related Solutions
With such large sample sizes both tests will have high power to detect minor differences. The 2 distributions could be almost identical with a small difference in shape location that is not of practical importance and the tests would reject (because they are different).
If all you really care about is a statistically significant difference then you can be happy with the results of the KS test (and others, even a t-test will be meaningful with non-normal data of those sample sizes due to the Central Limit Theorem).
If you care about practical or meaningful differences then things become subjective, but you can compare using various plots to help you decide if you think there are differences that are enough to care about.
Another possibility is doing a visual test as documented in
Buja, A., Cook, D. Hofmann, H., Lawrence, M. Lee, E.-K., Swayne, D.F and Wickham, H. (2009) Statistical Inference for exploratory data analysis and model diagnostics Phil. Trans. R. Soc. A 2009 367, 4361-4383 doi: 10.1098/rsta.2009.0120
The vis.test
function in the TeachingDemos package for R helps implement the test, but it can be done by hand as well.
Basically you create a bunch of graphs and then see if you can tell which is which. For your question one possibility would be to create a histogram of the 122,000 observations from the one month, then take several samples of 122,000 from the 300,000 observations of the other month and create histograms of each of those samples. Then present someone (or several someones) with all the histograms in random order and see if they can pick out the one that represents the second month. If they consistently pick out the correct graph then that says there is something visually different and you can further explore how they differ. If they don't pick out the correct graph then that suggests that while there may be a statistally significant difference, it is not important enough to distinguish them visually.
This is not a problem of the t-test, but of any test in which the power of the test depends on the sample size. This is called "overpowering". And yes, changing the test to Mann-Whitney will not help.
Therefore, apart from asking whether the results are statistically significant, you need to ask yourself whether the observed effect size is significant in the common sense of the word (i.e., meaningful). This requires more than statistical knowledge, but also your expertise in the field you are investigating.
In general, there are two ways you can look at the effect size. One way is to scale the difference between the means in your data by its standard deviation. Since standard deviation is in the same units as your means and describes the dispersion of your data, you can express the difference between your groups in terms of standard deviation. Also, when you estimate the variance / standard deviation in your data, it does not necessarily decrease with the number of samples (unlike standard deviation of the mean).
This is, for example, the reasoning behind Cohen's $d$:
$$d = \frac{ \bar{x}_1 - \bar{x}_2 }{ s}$$
...where $s$ is the square root of the pooled variance.
$$s = \sqrt{\frac{ s_1^2\cdot(n_1-1) + s_2^2\cdot(n_2 - 1) }{ N - 2 } }$$
(where $N=n_1+n_2$ and $s_1$ and $s_2$ are the standard deviations in group 1 and 2, respectively; that is, $s_1 = \sqrt{ \frac{\sum(x_i-\bar{x_1})^2 }{n_1 -1 }} $).
Another way of looking at the effect size -- and frankly, one that I personally prefer -- is to ask what part (percentage) of the variability in the data can be explained by the estimated effect. You can estimate the variance between and within the groups and see how they relate (this is actually what ANOVA is, and t-test is in principle a special case of ANOVA).This is the reasoning behind the coefficient of determination, $r^2$, and the related $\eta^2$ and $\omega^2$ stats. Now, in a t-test, $\eta^2$ can easily be calculated from the $t$ statistic itself:
$$\eta^2 = \frac{ t^2}{t^2 + n_1 + n_2 - 2 }$$
This value can be directly interpreted as "fraction of variance in the data which is explained by the difference between the groups". There are different rules of thumb to say what is a "large" and what is a "small" effect, but it all depends on your particular question. 1% of the variance explained can be laughable, or can be just enough.
Best Answer
This has been discussed at length on this site. Briefly, the test is valid. But no test is especially helpful because of our inability to interpret large p-values, which do not indicate "no difference". Instead I would replace a test with a confidence interval or Bayesian credible interval. These have interpretations regardless of sample size and regardless of whether a null hypothesis is true.