Solved – Odds ratio necessary to achieve a certain power

I have the data resulting from a genome wide association study. What this means is that logistic regression has been used to determine the association of each genetic variant with the phenotype (which is case/control), with the desired statistic being an odds ratio. I believe the power will be calculated with based on four components:

The effect size (in this case odds ratio),
The threshold of statistical significance,
The number of samples,
And the minor allele frequency in the sample (in other words, the prevalence of the less-common value of the independent variable).

So if I take that list and swap out the effect size for the desired power, I should be able to calculate the odds ratio I need.

I imagine the process working like this:

I use the significance level to find the corresponding z-score,
I calculate the standard error of the odds ratio from the sample size and minor allele frequency,
I use the z-score and the standard error to determine the minimum odds ratio that will pass the test,
and I then determine how large the odds ratio needs to be, given its standard error, for there to be an 80% chance (or any other power level) that it will be higher than the minimum passing z-score.

First of all, is this correct? If it is, then my question is: how do I determine the standard error of the odds ratio?

In case an example is helpful, I have one genetic variant that has 61,000 samples, 28% of which are A and 72% of which are B. Given that my significance threshold is $5 \times 10^{-8}$, how large an odds ratio would this variant have to have for the power to be 0.8?

Edit: Additionally, should I be doing this with the log OR instead?

Best Answer

If you already have all the data (as you seem to), there isn't much point to calculating the power. Either you will find significant differences or you won't. Russ Lenth's web page explains why retrospective power calculations are inadvisable.

If you are still designing the study, or for future reference, then you are probably better off using a well vetted program for power calculations rather than designing your own. The gap package in R seems to fit your needs, with its pbsize2() function specifically written for case-control association study design. That function's manual page specifies the information required to calculate power:

Essentially, for given sample size(s), a proportion of which (fc) being cases, the function calculates power estimate for a given type I error (alpha), genotype relative risk (gamma),frequency of the risk allele (p), the prevalence of disease in the population (kp) and optionally a disease model (model).

For "disease" you would substitute your phenotype. In practice, you would use this function iteratively for study design to find the sample size needed to meet your requirements.

If you are doing multiple association tests, or looking genome wide for associations without prior hypotheses, then you have to take into account multiple testing. The very low significance threshold you specify suggests that you have already done so.

In terms of the standard error of an odds ratio, you should use the log OR, as explained on this Cross Validated page. If your data are in the form of a 2 x 2 contingency table (cases/controls x major/minor allele) then you need to have all four values in the table. I think the apparent discrepancy between the sources you cite in a comment is that the power calculation is done pre-study with the phenotype/allele association not yet known, while the odds-ratio calculations are based on already observed phenotypes and alleles.

Best Answer

Related Solutions

Solved – How to get odds ratio per increase in interquartile range

Related Question