Solved – Small and unequal sample sizes

small-samplestatistical significance

I have a data set consisting of a small number of participants: n=5, n=5, n=2. I want to compare the means on a particular variable. I've done a 1-way anova and the p-value was .07, the Welch test gives me p = .005. Which is correct?

Best Answer

I agree with kjetil. The Welch test is a two sample test of differences between sample means when the variances are assumed to be different. The variance estimate is based on estimates of separate individual variances rather than a pooled estimate of a common variance. So the test statistic under the null hypothesis does not have a $t$ distribution but can be approximated by a $t$ with a particular df parameter that may not be an integer. If the original data are approximately normal which test to use depends on whether the variances are equal. With sample sizes this small you cannot even assess the adequacy of the normality assumption much less the equality of variance assumption. So of the various frequentist approaches (1) parametric two sample $t$ test, (2) parametric Welch test and (3) nonparamnetric Wilcoxon rank sum test, you have no way to be confident that one is better than the others and they all can give very different p-values and possibly lead to conflicting conclusions.

In my view this means that you cannot do inference because the sample size is just too small. Some may incorrectly think that the bootstrap can bail you out. But it can't. Also we see other assert that Bayesian methods can save the day. I disagree with that too because for Bayesian methods the conclusion will be dominated by the prior and the data will not have much influence. I think it is foolish to put so much dependence on the prior particularly if you can't make a strong case for using a specific prior.