Hypothesis Testing – Wilcoxon Test vs Bootstrapping vs Alternative Methods

A colleague has developed a treatment for to "prevent falls" in cognitively impaired, psychiatric patients. Since this would be very useful treatment in this population, we especially do not want to make a Type II error (i.e., fail to reject the null, when we should reject it).

Since the data is not normally distributed, another colleague evaluated the full data using (appropriately, I believe) the Wilcoxon test, and did not find significance. There may be valid methodological reasons for this, which I may follow-up on later with another question.

I was concerned about committing a Type II error, and obtained some preliminary data, which I have below. These data reflect pre/post scores (# of "falls") for the same patients (no control group), so should be considered "paired" and not independent:

pre <- c(9,8,37,12,8,3,4,4,3,5,4,8,4,8,9,11,2,4,0,0,5,12,10,2,8,3,0,22,1,0,0,5,0,3,1,5)

post <- c(10,8,6,4,5,2,4,4,2,2,1,7,2,1,3,9,2,2,0,0,6,16,4,3,4,7,0,10,3,0,0,4,0,1,1,5)

When I ran a bootstrapping procedure on this preliminary data (adapted from Crawley, The R Book, p. 385)

preBoot <- numeric(10000)
for (i in 1:10000) {preBoot[i] <- mean(sample(pre, replace=T)) }
quantile(preBoot, c(0.025, 0.975))

and compared the post mean to the bootstrap estimate of the sampling distribution of the mean, I found that the treatment did have a significant beneficial effect. To evaluate significance, I simply took the quantiles for the sample estimate at 0.025 and 0.975; is this correct or am I confusing what I would do with a normal distribution with the distribution of the sample estimates of the mean?

Also, using wilcox.test in R on the preliminary data (i.e.,

wilcox.test(pre, post, paired=T, exact=F)

shows this to be significant.

I would like to know, before I go further, did I use the bootstrapping procedure correctly, and is this a legitimate test for these type of data?

Are there other tests we should consider, and what would be the best way to report this? I am especially interested in methods that would allow us to obtain confidence intervals.

Also, I see in this previous question Wilcoxon one-tailed test the response indicated that "keep in mind that it's generally not advisable to use one-tailed tests," but if I'm interested specifically in fewer falls after the treatment intervention, wouldn't a one-tailed test be appropriate?

Additional Info Update:
I just found a wonderful review on count data and analysis by Neal Alexander, Review: analysis of parasite and other skewed counts , accessed via PubMed http://www.ncbi.nlm.nih.gov/pubmed/22943299 which discusses the issues I've been facing in a very accessible manner (and it is free online). Other reading this question may also find this quite helpful.

I'm still digesting this info. This probably belongs in a new question, but in essence, I believe in my field (clinical psychology) the standard way of looking at this data would be with a Wilcoxon test, with perhaps a square root transformation and a t-test running in second place. Most people currently don't use R, and so don't seem to be aware of or use bootstrapping, which I actually believe would be better than the above two methods. If anybody has further info/or information to the contrary, I would appreciate it).

Best Answer

You investigate a paired data situation, as you mentioned, however, you treated it like independent. You should run the bootstrap on differences of pre/post measurements of each patient. Then, see, whether the interval contains zero.
Although it's not generally advisable, in the situation as you describe it applying the one-tailed spcification is reasonable.
Wilcoxon Signed Rank test is correct method, as is Median Sign test. Moreover, you can consider transforming your count data by taking square roots and then performing t-test.

Best Answer

Related Solutions

Hypothesis Testing – Appropriateness of Wilcoxon Signed Rank Test

Solved – Confidence interval does not include 0, but Wilcoxon test is sgnificant

Related Question