Solved – How to interpret false positive from QQ-plot in genome-wide association studies

geneticsqq-plot

I am doing SNP association study, and the estimated p-values for each SNP are plotted as a QQ-plot. The question is, can one interpret false positive hits from a QQ-plot in GWAS (Genome-Wide Association Study)?

From this article: http://www.genomesunzipped.org/2010/07/how-to-read-a-genome-wide-association-study.php , it says:

"…on the other hand, should show a solid line matching X=Y until it sharply curves at the end (representing the small number of true associations among thousands of unassociated SNPs)"

Is this always true, or may False Positive also deviate sharply at the end of the curve? How to tell from a QQ-plot if a deviation is because of a true association or a false association?

Best Answer

Under the null hypothesis, your p-values should adhere to a uniform distribution (I'm ignoring some issues with dependencies here, as they are not really relevant to the discussion). This includes the false positives: just by coincidence, you should get a few extreme p-values, but since this is exactly what is in the expected distribution, this will not distort the QQ-plot: false positives are already accounted for in this type of QQ-plot.

On the other hand, if some SNPs were tested where the null hypothesis is not true, typically this should show up as a lower p-value than expected, and these will indeed distort the image.

What the author of the article you link seems to mean is: if you have wicked strong distortion, there are unexpectedly (as per the uniform distribution) many extreme p-values, so something is likely to be wrong.

So, in short: a uniform QQ-plot of p-values (like this) should show relatively little digression from the straight line, and only in the extremely low p-values (showing you have more low p-values than if the null hypothesis were true everywhere).

Related Solutions

Solved – How to calculate confidence intervals for pooled odd ratios in meta-analysis

In most meta-analysis of odds ratios, the standard errors $se_i$ are based on the log odds ratios $log(OR_i)$. So, do you happen to know how your $se_i$ have been estimated (and what metric they reflect? $OR$ or $log(OR)$)? Given that the $se_i$ are based on $log(OR_i)$, then the pooled standard error (under a fixed effect model) can be easily computed. First, let's compute the weights for each effect size: $w_i = \frac{1}{se_i^2}$. Second, the pooled standard error is $se_{FEM} = \sqrt{\frac{1}{\sum w}}$. Furthermore, let $log(OR_{FEM})$ be the common effect (fixed effect model). Then, the ("pooled") 95% confidence interval is $log(OR_{FEM}) \pm 1.96 \cdot se_{FEM}$.

Update

Since BIBB kindly provided the data, I am able to run the 'full' meta-analysis in R.

library(meta)
or <- c(0.75, 0.85)
se <- c(0.0937, 0.1029)
logor <- log(or)
(or.fem <- metagen(logor, se, sm = "OR"))

> (or.fem <- metagen(logor, se, sm = "OR"))
    OR            95%-CI %W(fixed) %W(random)
1 0.75  [0.6242; 0.9012]     54.67      54.67
2 0.85  [0.6948; 1.0399]     45.33      45.33

Number of trials combined: 2 

                         OR           95%-CI       z  p.value
Fixed effect model   0.7938  [0.693; 0.9092] -3.3335   0.0009
Random effects model 0.7938  [0.693; 0.9092] -3.3335   0.0009

Quantifying heterogeneity:
tau^2 < 0.0001; H = 1; I^2 = 0%

Test of heterogeneity:
    Q d.f.  p.value
 0.81    1   0.3685

Method: Inverse variance method

References

See, e.g., Lipsey/Wilson (2001: 114)

Solved – Meta-analysis in R with multiple SNPs

The MetABEL part of GenABEL does this. For 150 SNPs, you might find coding the loop yourself quicker than ensuring it's doing exactly what you want. (Neither should take very long)

Best Answer

Related Solutions

Solved – How to calculate confidence intervals for pooled odd ratios in meta-analysis

Solved – Meta-analysis in R with multiple SNPs

Related Question