I have some data I'm analyzing, that looks something like this:
exampleDF <- data.frame(s1 = rnorm(10), s2 = rnorm(10), s3 = rnorm(10), s4 = rnorm(10), s5 = rnorm(10))
> exampleDF
s1 s2 s3 s4 s5
1 -1.8167533 1.7702279 -0.71073850 1.4676012 0.11589850
2 -0.9283220 -0.6028665 -0.12541521 -0.8027177 1.01037162
3 -1.8452510 0.2370927 0.02927646 0.9991479 0.23636339
4 -1.1728676 0.8326918 1.50497972 -0.2297063 0.08417196
5 0.0526901 0.4537119 -0.95115552 -0.6845546 -0.83615065
6 1.3307005 -2.8312321 0.09037220 -2.1499721 -0.30079861
7 2.0669124 -0.3826490 0.03627688 -0.7294662 -0.13742160
8 0.4938200 0.1285557 -0.06457083 0.4582254 0.03022469
9 -1.5334708 -0.1054318 0.93166124 0.4045524 1.81366144
10 -0.1566743 0.9812490 -0.03122229 -2.1211528 0.65671646
Here, A = {s1,s2,s3}
is all one experimental group and B = {s4,s5}
is another group.
What I'd like to do, is go through each row, and compare the means of A
and B
to see if they are significantly different. i.e. for each row, compare the means of {s1,s2,s3}
and {s4,s5}
.
So for each row, I'd like a p-value so I can decide if the two groups are statistically significant. However, my actual data has ~20,000 rows, so I think I'm supposed to correct for multiple testing to reduce the Type 1 error rate, right? What test should I use, and how do I do it in R?
I'm not sure how to go about setting up the design matrix, and whether I should use the t.test()
function in R, or if there's a more appropriate function?
Best Answer
First of all, if you have only 3 + 2 samples in each group, you are quite inaccurate in estimating the standard error, and doing this 20000x t.test is most not likely not the best approach. If your data is something related to gene expression or biological data, my suggestion is to check out the R bioconductor package limma and also this paper.
If you absoluately want to do a t.test, below is a quick method, using a library and the p-values are approximate:
The default in
rowttest
assumes equal variance for groups. You have to see whether this is true. If this is not the case, you can go back to using t.test()Then it will be: