Multiple Tests – Correcting p Values for Correlated Tests in Genetics

correlationgeneticsmultiple-comparisonsstatistical significance

I have p values from a lot of tests and would like to know whether there is actually something significant after correcting for multiple testing. The complication: my tests are not independent. The method I am thinking about (a variant of Fisher's Product Method, Zaykin et al., Genet Epidemiol, 2002) needs the correlation between the p values.

In order to estimate this correlation, I am currently thinking about bootstrapping cases, running the analyses and correlating the resulting vectors of p values. Does anyone have a better idea? Or even a better idea for my original problem (correcting for multiple testing in correlated tests)?

Background: I am logistically regressing whether or not my subjects are suffering from a particular disease on the interaction between their genotype (AA, Aa or aa) and a covariate. However, the genotype is actually a lot (30-250) of Single Nucleotide Polymorphisms (SNPs), which are certainly not independent but in Linkage Disequilibrium.

Best Answer

This is actually a hot topic in Genomewide analysis studies (GWAS)! I am not sure the method you are thinking of is the most appropriate in this context. Pooling of p-values was described by some authors, but in a different context (replication studies or meta-analysis, see e.g. (1) for a recent review). Combining SNP p-values by Fisher's method is generally used when one wants to derive an unique p-value for a given gene; this allows to work at the gene level, and reduce the amount of dimensionality of subsequent testing, but as you said the non-independence between markers (arising from spatial colocation or linkage disiquilibrium, LD) introduce a bias. More powerful alternatives rely on resampling procedures, for example the use of maxT statistics for combining p-value and working at the gene level or when one is interested in pathway-based approaches, see e.g. (2) (§2.4 p. 93 provides details on their approach).

My main concerns with bootstraping (with replacement) would be that you are introducing an artificial form of relatedness, or in other words you create virtual twins, hence altering Hardy-Weinberg equilibrium (but also minimum allele frequency and call rate). This would not be the case with a permutation approach where you permute individual labels and keep the genotyping data as is. Usually, the plink software can give you raw and permuted p-values, although it uses (by default) an adaptive testing strategy with a sliding window that allows to stop running all permutations (say 1000 per SNP) if it appears that the SNP under consideration is not "interesting"; it also has option for computing maxT, see the online help.

But given the low number of SNPs you are considering, I would suggest relying on FDR-based or maxT tests as implemented in the multtest R package (see mt.maxT), but the definitive guide to resampling strategies for genomic application is Multiple Testing Procedures with Applications to Genomics, from Dudoit & van der Laan (Springer, 2008). See also Andrea Foulkes's book on genetics with R, which is reviewed in the JSS. She has great material on multiple testing procedures.

Further Notes

Many authors have pointed to the fact that simple multiple testing correcting methods such as the Bonferroni or Sidak are too stringent for adjusting the results for the individual SNPs. Moreover, neither of these methods take into account the correlation that exists between SNPs due to LD which tags the genetic variation across gene regions. Other alternative have bee proposed, like a derivative of Holm's method for multiple comparison (3), Hidden Markov Model (4), conditional or positive FDR (5) or derivative thereof (6), to name a few. So-called gap statistics or sliding window have been proved successful in some case, but you'll find a good review in (7) and (8).

I've also heard of methods that make effective use of the haplotype structure or LD, e.g. (9), but I never used them. They seem, however, more related to estimating the correlation between markers, not p-value as you meant. But in fact, you might better think in terms of the dependency structure between successive test statistics, than between correlated p-values.

References

  1. Cantor, RM, Lange, K and Sinsheimer, JS. Prioritizing GWAS Results: A Review of Statistical Methods and Recommendations for Their Application. Am J Hum Genet. 2010 86(1): 6–22.
  2. Corley, RP, Zeiger, JS, Crowley, T et al. Association of candidate genes with antisocial drug dependence in adolescents. Drug and Alcohol Dependence 2008 96: 90–98.
  3. Dalmasso, C, Génin, E and Trégouet DA. A Weighted-Holm Procedure Accounting for Allele Frequencies in Genomewide Association Studies. Genetics 2008 180(1): 697–702.
  4. Wei, Z, Sun, W, Wang, K, and Hakonarson, H. Multiple Testing in Genome-Wide Association Studies via Hidden Markov Models. Bioinformatics 2009 25(21): 2802-2808.
  5. Broberg, P. A comparative review of estimates of the proportion unchanged genes and the false discovery rate. BMC Bioinformatics 2005 6: 199.
  6. Need, AC, Ge, D, Weale, ME, et a. A Genome-Wide Investigation of SNPs and CNVs in Schizophrenia. PLoS Genet. 2009 5(2): e1000373.
  7. Han, B, Kang, HM, and Eskin, E. Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers. PLoS Genetics 2009
  8. Liang, Y and Kelemen, A. Statistical advances and challenges for analyzing correlated high dimensional snp data in genomic study for complex diseases. Statistics Surveys 2008 2 :43–60. -- the best recent review ever
  9. Nyholt, DR. A Simple Correction for Multiple Testing for Single-Nucleotide Polymorphisms in Linkage Disequilibrium with Each Other. Am J Hum Genet. 2004 74(4): 765–769.
  10. Nicodemus, KK, Liu, W, Chase, GA, Tsai, Y-Y, and Fallin, MD. Comparison of type I error for multiple test corrections in large single-nucleotide polymorphism studies using principal components versus haplotype blocking algorithms. BMC Genetics 2005; 6(Suppl 1): S78.
  11. Peng, Q, Zhao, J, and Xue, F. PCA-based bootstrap confidence interval tests for gene-disease association involving multiple SNPs. BMC Genetics 2010, 11:6
  12. Li, M, Romero, R, Fu, WJ, and Cui, Y (2010). Mapping Haplotype-haplotype Interactions with Adaptive LASSO. BMC Genetics 2010, 11:79 -- although not directly related to the question, it covers haplotype-based analysis/epistatic effect
Related Question