As Hans Engler suggested, a bootstrap should work well for these data. You can use the boot
or bootstrap
packages directly, but it’s much easier to use the simpleboot
package:
s = c(13,7,10,4,28,0,10,0,0,0,0,0,0,0,0,0,6,
0,0,0,0,0,0,0,4,0,0,0,4,0,0,0,1,2,2,0,
2,3,3,3,1,3,12,33,1,31,0,1,21,0,3,1,8,
0,1,1,6,0,2,0)
library(simpleboot)
b = one.boot(s, mean, R=10^4)
boot.ci(b, type="perc")
The output is:
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 10000 bootstrap replicates
CALL :
boot.ci(boot.out = b, type = "perc")
Intervals :
Level Percentile
95% ( 2.10, 5.85 )
Calculations and Intervals on Original Scale
As you can see (and as one might have expected), the confidence interval is not very different from the confidence interval from the t-test (1.87–5.69).
For the calculation of confidence intervals you'll need standard errors for the effects, but those are not available in the output. However, the standard errors can be estimated from the Wald statistics and odds ratios.
The calculation goes as follows:
- Take a natural logarithm from the odds ratio. This gives you the beta from the logistic model. For example for the first row of your table: beta=ln(4.23)=1.442
- The standard error for the beta is calculated by dividing the beta by the square root of the Walds statistic (STAT). Then take the absolute value of the result. Again, for the first row of your table: se=1.442/sqrt(61.5)=0.183.
- The 95% confidence interval for the beta is then beta+/-1.96*se. The constant 1.96 comes from the normal distribution. Again, for the first row of data: 1.442-1.96*0.183 ... 1.442+1.96*0.183 = 1.081...1.802.
- Last, you need to change the confidence interval of the beta to the confidence interval of the odds ratio. This happens simply by exponentiating the confidence interval of the beta. For the first line of data: 2.71828^1.081 = 2.949 and 2.71828^1.802 = 6.065.
So, your odds ratio for the first row of the table is 4.23 and it's 95% confidence interval is 2.949-6.065. Because the confidence interval does not include one, the results is statistically significant. The results are subject to error due to rounding of the output from PLINK.
This calculation can be achieved in, e.g., Excel, but below is also an R function that does the same thing (just in case you also use R).
# The data
or<-structure(list(SNP1 = structure(c(1L, 1L, 1L, 1L), .Label = "rs1", class = "factor"),
SNP2 = structure(c(1L, 1L, 1L, 1L), .Label = "rs2", class = "factor"),
HAPLOTYPE = c(22L, 12L, 21L, 11L), F = c(0.00992, 0.038,
0.00015, 0.952), OR = c(4.23, 1.02, 5.22e-10, 0.762), STAT = c(61.5,
0.217, 453, 22.9), P = c(4.43e-15, 0.642, 1.77e-100, 1.73e-06
)), .Names = c("SNP1", "SNP2", "HAPLOTYPE", "F", "OR", "STAT",
"P"), class = "data.frame", row.names = c(NA, -4L))
# The function
orci<-function(or) {
or$beta<-log(or$OR)
or$se<-abs(or$beta/sqrt(or$STAT))
or$lower<-or$beta-1.96*or$se
or$upper<-or$beta+1.96*or$se
or$LOWER<-exp(or$lower)
or$UPPER<-exp(or$upper)
or$res<-paste(or$OR, " (", round(or$LOWER, 3), "-", round(or$UPPER, 3), ")", sep="")
return(or)
}
# The calculation
orci(or)
# The result
#SNP1 SNP2 HAPLOTYPE F OR STAT P beta se lower upper LOWER UPPER res
#1 rs1 rs2 22 0.00992 4.23e+00 61.500 4.43e-15 1.44220199 0.18390288 1.08175235 1.8026516 2.949844e+00 6.065710e+00 4.23 (2.95-6.066)
#2 rs1 rs2 12 0.03800 1.02e+00 0.217 6.42e-01 0.01980263 0.04251018 -0.06351733 0.1031226 9.384579e-01 1.108627e+00 1.02 (0.938-1.109)
#3 rs1 rs2 21 0.00015 5.22e-10 453.000 1.77e-100 -21.37335353 1.00420775 -23.34160072 -19.4051063 7.292419e-11 3.736538e-09 0.000000000522 (0-0)
#4 rs1 rs2 11 0.95200 7.62e-01 22.900 1.73e-06 -0.27180872 0.05679965 -0.38313603 -0.1604814 6.817202e-01 8.517337e-01 0.762 (0.682-0.852)
Best Answer
As I've commented at Stack Overflow, you could bootstrap them:
I show stratified bootstrap here, but you get very similar results for non-stratified bootstrap.