Solved – Calculate odds ratio confidence intervals from plink output

confidence intervalgeneticsodds-ratio

I have output from plink haplotype analysis, however I do not have the raw data. Here, is the output for Haplotype-based association tests with GLMs:

SNP1    SNP2    HAPLOTYPE   F   OR  STAT    P
rs1 rs2 22  0.00992 4.23    61.5    4.43E-15
rs1 rs2 12  0.038   1.02    0.217   0.642
rs1 rs2 21  0.00015 5.22E-10    453 1.77E-100
rs1 rs2 11  0.952   0.762   22.9    1.73E-06

Here is the explanation for each column from plink:

        SNP1    SNP ID of left-most (5') SNP
        SNP2    SNP ID of left-most (3') SNP
   HAPLOTYPE    Haplotype 
           F    Frequency in sample
          OR    Estimated odds ratio
        STAT    Test statistic (T from Wald test)
           P    Asymptotic p-value

Question: based on above output, is it possible to calculate OR 95% confidence intervals?

Best Answer

For the calculation of confidence intervals you'll need standard errors for the effects, but those are not available in the output. However, the standard errors can be estimated from the Wald statistics and odds ratios.

The calculation goes as follows:

  1. Take a natural logarithm from the odds ratio. This gives you the beta from the logistic model. For example for the first row of your table: beta=ln(4.23)=1.442
  2. The standard error for the beta is calculated by dividing the beta by the square root of the Walds statistic (STAT). Then take the absolute value of the result. Again, for the first row of your table: se=1.442/sqrt(61.5)=0.183.
  3. The 95% confidence interval for the beta is then beta+/-1.96*se. The constant 1.96 comes from the normal distribution. Again, for the first row of data: 1.442-1.96*0.183 ... 1.442+1.96*0.183 = 1.081...1.802.
  4. Last, you need to change the confidence interval of the beta to the confidence interval of the odds ratio. This happens simply by exponentiating the confidence interval of the beta. For the first line of data: 2.71828^1.081 = 2.949 and 2.71828^1.802 = 6.065.

So, your odds ratio for the first row of the table is 4.23 and it's 95% confidence interval is 2.949-6.065. Because the confidence interval does not include one, the results is statistically significant. The results are subject to error due to rounding of the output from PLINK.

This calculation can be achieved in, e.g., Excel, but below is also an R function that does the same thing (just in case you also use R).

# The data
or<-structure(list(SNP1 = structure(c(1L, 1L, 1L, 1L), .Label = "rs1", class = "factor"), 
SNP2 = structure(c(1L, 1L, 1L, 1L), .Label = "rs2", class = "factor"), 
HAPLOTYPE = c(22L, 12L, 21L, 11L), F = c(0.00992, 0.038, 
0.00015, 0.952), OR = c(4.23, 1.02, 5.22e-10, 0.762), STAT = c(61.5, 
0.217, 453, 22.9), P = c(4.43e-15, 0.642, 1.77e-100, 1.73e-06
)), .Names = c("SNP1", "SNP2", "HAPLOTYPE", "F", "OR", "STAT", 
"P"), class = "data.frame", row.names = c(NA, -4L))

# The function
orci<-function(or) {
   or$beta<-log(or$OR)
   or$se<-abs(or$beta/sqrt(or$STAT))
   or$lower<-or$beta-1.96*or$se
   or$upper<-or$beta+1.96*or$se
   or$LOWER<-exp(or$lower)
   or$UPPER<-exp(or$upper)
   or$res<-paste(or$OR, " (", round(or$LOWER, 3), "-", round(or$UPPER, 3), ")", sep="")
   return(or)
}

# The calculation
orci(or)

# The result
#SNP1 SNP2 HAPLOTYPE       F       OR    STAT         P         beta         se        lower       upper        LOWER        UPPER                  res
#1  rs1  rs2        22 0.00992 4.23e+00  61.500  4.43e-15   1.44220199 0.18390288   1.08175235   1.8026516 2.949844e+00 6.065710e+00    4.23 (2.95-6.066)
#2  rs1  rs2        12 0.03800 1.02e+00   0.217  6.42e-01   0.01980263 0.04251018  -0.06351733   0.1031226 9.384579e-01 1.108627e+00   1.02 (0.938-1.109)
#3  rs1  rs2        21 0.00015 5.22e-10 453.000 1.77e-100 -21.37335353 1.00420775 -23.34160072 -19.4051063 7.292419e-11 3.736538e-09 0.000000000522 (0-0)
#4  rs1  rs2        11 0.95200 7.62e-01  22.900  1.73e-06  -0.27180872 0.05679965  -0.38313603  -0.1604814 6.817202e-01 8.517337e-01  0.762 (0.682-0.852)
Related Question