Solved – How to analyze these data

hypothesis testingr

The biological data is listed as following:

   V1    V2    V3    V4    V5    V6
0.064 0.014 0.016 0.012 0.013 0.023
0.056 0.000 0.000 0.008 0.010 0.000
0.042 0.014 0.024 0.008 0.017 0.023
0.031 0.014 0.016 0.008 0.013 0.023
0.068 0.000 0.008 0.004 0.020 0.000
0.081 0.000 0.000 0.004 0.010 0.000
0.060 0.014 0.016 0.006 0.010 0.023

or you can download the data from http://www.mediafire.com/?6yp9l9m47jv433a.

A<- dat[,1] 
B<- dat[,2:6]

I want to compare the difference between the first column to other columns of the data.Because only dat[,2] and dat[,6] not subject to normal distribute,I used wilcox.test instead of t.test function to caculate in R. But the warning messages rised up,such as "In wilcox.test.default(A, B[, 1]) : cannot compute exact p-value with ties". Could you give me some suggestions? Thank you.

wilcox.test(A,B[,1])

Wilcoxon rank sum test with continuity correction

data: A and B[, 1] W = 49, p-value = 0.00184 alternative hypothesis: true location shift is not equal to 0

Warning message: In wilcox.test.default(A, B[, 1]) : cannot compute exact p-value with ties

Best Answer

Sometimes a formal statistical test is overkill. Row by row, the entries in the first column are the largest. Draw a picture to make this apparent: side-by-side boxplots or dotplots would work nicely.

Although this is a post-hoc comparison, if the initial intent had been to compare the first column against the rest for a shift in distribution, the most extreme characterizations would be that either all maxima or all minima occur in the first column (a two-sided test). The chance of this occurring by chance, if all columns contained values drawn at random from a common distribution, would be $2 (\frac{1}{6})^7$ = about 0.0007%.

In fact, the first two contains the largest 7 of the 42 values. Again, ex post facto, the chance of such an extreme ordering occurring equals $\frac{2}{42 \choose 7}$ = about 0.000007%.

These results indicate that any reasonably powerful test you choose to conduct will conclude there's a highly significant difference.

In any event, You don't need a p-value; you need to characterize how large the difference is (the right way to do this depends on what the data mean) and you need to seek an explanation for the difference.

Related Question