It shouldn't matter that much since the test statistic will always be the difference in means (or something equivalent). Small differences can come from the implementation of Monte-Carlo methods. Trying the three packages with your data with a one-sided test for two independent variables:
DV <- c(x1, y1)
IV <- factor(rep(c("A", "B"), c(length(x1), length(y1))))
library(coin) # for oneway_test(), pvalue()
pvalue(oneway_test(DV ~ IV, alternative="greater",
distribution=approximate(B=9999)))
[1] 0.00330033
library(perm) # for permTS()
permTS(DV ~ IV, alternative="greater", method="exact.mc",
control=permControl(nmc=10^4-1))$p.value
[1] 0.003
library(exactRankTests) # for perm.test()
perm.test(DV ~ IV, paired=FALSE, alternative="greater", exact=TRUE)$p.value
[1] 0.003171822
To check the exact p-value with a manual calculation of all permutations, I'll restrict the data to the first 9 values.
x1 <- x1[1:9]
y1 <- y1[1:9]
DV <- c(x1, y1)
IV <- factor(rep(c("A", "B"), c(length(x1), length(y1))))
pvalue(oneway_test(DV ~ IV, alternative="greater", distribution="exact"))
[1] 0.0945907
permTS(DV ~ IV, alternative="greater", exact=TRUE)$p.value
[1] 0.0945907
# perm.test() gives different result due to rounding of input values
perm.test(DV ~ IV, paired=FALSE, alternative="greater", exact=TRUE)$p.value
[1] 0.1029412
# manual exact permutation test
idx <- seq(along=DV) # indices to permute
idxA <- combn(idx, length(x1)) # all possibilities for different groups
# function to calculate difference in group means given index vector for group A
getDiffM <- function(x) { mean(DV[x]) - mean(DV[!(idx %in% x)]) }
resDM <- apply(idxA, 2, getDiffM) # difference in means for all permutations
diffM <- mean(x1) - mean(y1) # empirical differencen in group means
# p-value: proportion of group means at least as extreme as observed one
(pVal <- sum(resDM >= diffM) / length(resDM))
[1] 0.0945907
coin
and exactRankTests
are both from the same author, but coin
seems to be more general and extensive - also in terms of documentation. exactRankTests
is not actively developed anymore. I'd therefore choose coin
(also because of informative functions like support()
), unless you don't like to deal with S4 objects.
EDIT: for two dependent variables, the syntax is
id <- factor(rep(1:length(x1), 2)) # factor for participant
pvalue(oneway_test(DV ~ IV | id, alternative="greater",
distribution=approximate(B=9999)))
[1] 0.00810081
Non-parametric tests are likely to be less powerful than parametric tests and thus require a larger sample size. This is annoying because if you had a large sample size, sample means would be approximately normally distributed by the central limit theorem, and you thus wouldn't need non-parametric tests.
Look at generalized linear models, of which least squares and Poisson are special cases. I've never found a text that explains this particularly well; try talking to someone about it.
Look at non-parametric methods if you feel like it, but I have a hunch that they won't help you much in this case unless you're using ordinal data or a large set of very bizarrely distributed data.
Best Answer
It is true that precisely normal populations are rare in the real world.
However, some very useful procedures are 'robust' against mild non-normality. Perhaps the most important of them is the t test, which performs remarkably well with samples of moderate or large size that are not exactly normal.
Also, some tests that were derived for use with normal data have better power than nonparametric alternatives (that is, they are more likely to reject the null hypothesis when it is false), and this advantage persists to an extent when these tests are used with slightly non-normal data.
Nonparametric tests such as sign tests and the rank-based Wilcoxon, Kruskal-Wallis, and Friedman tests lose information when data are reduced to ranks (or to +'s and -'s), and the result can be failure to find a real effect when it is present in experimental data.
You are correct that some ANOVA tests behave badly when data are not normal, but many tests using the chi-squared distribution are for categorical data and normality is not an issue.
Recently, new nonparametric methods of data analysis have been invented and come into common use because computation is cheaper and more convenient now than it was several years ago. Some examples are bootstrapping and permutation tests. Sometimes they require hundreds of thousands or millions of computations compared with dozens for traditional tests. But the extra computation may take only seconds or a few minutes with modern computers.
Admittedly, some statisticians are not familiar with these new methods and fail to take appropriate advantage of them. Also, part of the reluctance to change is that consumers of or clients for statistical analyses may not trust results from procedures they have never heard of. But that is changing over time.
Fortunately, modern software and computers also make it possible to visualize data in ways that were previously tedious to show. As a very simple example (not using very fancy graphics), here are two plots of some data that I know cannot possibly be normal (even though they do manage to pass a couple of tests of normality because of the small sample size.)
These data are also pretty obviously not centered at $0.$ The optimum statistical procedure to confirm that would not be a t test or even a nonparametric Wilcoxon test. But both of these tests reject the null hypothesis that the data are centered at $0$: the t test with a P-value 0.013, the Wilcoxon test with P-value 0.0099. Both P-values are less than 0.05, so both confirm the obvious at the 5% level.
It is hardly a loss to science if I don't get around to using the optimal test. And some of the people reading my findings might be a lot more comfortable having the results of a t test. Maybe the next generation of clients will be more demanding.