Solved – How to interpret false positive from QQ-plot in genome-wide association studies

geneticsqq-plot

I am doing SNP association study, and the estimated p-values for each SNP are plotted as a QQ-plot. The question is, can one interpret false positive hits from a QQ-plot in GWAS (Genome-Wide Association Study)?

From this article: http://www.genomesunzipped.org/2010/07/how-to-read-a-genome-wide-association-study.php , it says:

"…on the other hand, should show a solid line matching X=Y until it sharply curves at the end (representing the small number of true associations among thousands of unassociated SNPs)"

Is this always true, or may False Positive also deviate sharply at the end of the curve? How to tell from a QQ-plot if a deviation is because of a true association or a false association?

Best Answer

Under the null hypothesis, your p-values should adhere to a uniform distribution (I'm ignoring some issues with dependencies here, as they are not really relevant to the discussion). This includes the false positives: just by coincidence, you should get a few extreme p-values, but since this is exactly what is in the expected distribution, this will not distort the QQ-plot: false positives are already accounted for in this type of QQ-plot.

On the other hand, if some SNPs were tested where the null hypothesis is not true, typically this should show up as a lower p-value than expected, and these will indeed distort the image.

What the author of the article you link seems to mean is: if you have wicked strong distortion, there are unexpectedly (as per the uniform distribution) many extreme p-values, so something is likely to be wrong.

So, in short: a uniform QQ-plot of p-values (like this) should show relatively little digression from the straight line, and only in the extremely low p-values (showing you have more low p-values than if the null hypothesis were true everywhere).