Solved – Interpretation of P values in Genome Wide Association Studies

geneticsgwasp-value

Genome wide associate studies (GWAS) are a common method used in associating single nucleotide polymorphisms (SNPs) to a disease or trait under study.

I don't work in this field and I'm always wondering how p-values are calculated for each SNP? How do I interpret say, the p-value of a SNP identified in GWAS is $1 \cdot 10^{-8}$? Is it significant?

Best Answer

P values in GWAS are calculated from a variety of tests, so it's not a simple answer. For a good overview, the common GWAS software plink describes some of their methodologies on that page.

The most common tests are the linear regression, logistic regression, and the Cochran-Armitage trend test.

The test of association is different if it is a case / control phenotype (heart attack / no heart attack), or if it is a continuous one (height).

An allele in studies such as that are coded as either 0, 1, or 2 depending on the number of risk increasing alleles present in the individual at that locus.

For a case/control phenotype, you're basically looking to see if there are more risk alleles in the case population than the control population. For the continuous phenotype situation, you're looking to see if a one allele increase robustly contributes to an increase in the phenotype.

A P value of 1 x $10^8$ would would meet the genome wide significance level, which is approximately 5 x $10^8$. This is meant to control the family-wise error rate (chance of even one false discovery) at 5%. A decent background is given in this paper -- Estimation of significance thresholds for genomewide association scans by Frank Dudbridge and Arief Gusnanto (link)

Edit: I forgot to mention that the tests differ depending on whether you are assuming an additive model (0 vs 1 vs 2), a dominant model (0 vs 1 or 2). There are more complicated models. If you have a paper that you've specifically read, you should link it in the Q, then we can talk about it the chat system.