I have p values from a lot of tests and would like to know whether there is actually something significant after correcting for multiple testing. The complication: my tests are not independent. The method I am thinking about (a variant of Fisher's Product Method, Zaykin et al., Genet Epidemiol, 2002) needs the correlation between the p values.
In order to estimate this correlation, I am currently thinking about bootstrapping cases, running the analyses and correlating the resulting vectors of p values. Does anyone have a better idea? Or even a better idea for my original problem (correcting for multiple testing in correlated tests)?
Background: I am logistically regressing whether or not my subjects are suffering from a particular disease on the interaction between their genotype (AA, Aa or aa) and a covariate. However, the genotype is actually a lot (30-250) of Single Nucleotide Polymorphisms (SNPs), which are certainly not independent but in Linkage Disequilibrium.
Best Answer
This is actually a hot topic in Genomewide analysis studies (GWAS)! I am not sure the method you are thinking of is the most appropriate in this context. Pooling of p-values was described by some authors, but in a different context (replication studies or meta-analysis, see e.g. (1) for a recent review). Combining SNP p-values by Fisher's method is generally used when one wants to derive an unique p-value for a given gene; this allows to work at the gene level, and reduce the amount of dimensionality of subsequent testing, but as you said the non-independence between markers (arising from spatial colocation or linkage disiquilibrium, LD) introduce a bias. More powerful alternatives rely on resampling procedures, for example the use of maxT statistics for combining p-value and working at the gene level or when one is interested in pathway-based approaches, see e.g. (2) (§2.4 p. 93 provides details on their approach).
My main concerns with bootstraping (with replacement) would be that you are introducing an artificial form of relatedness, or in other words you create virtual twins, hence altering Hardy-Weinberg equilibrium (but also minimum allele frequency and call rate). This would not be the case with a permutation approach where you permute individual labels and keep the genotyping data as is. Usually, the plink software can give you raw and permuted p-values, although it uses (by default) an adaptive testing strategy with a sliding window that allows to stop running all permutations (say 1000 per SNP) if it appears that the SNP under consideration is not "interesting"; it also has option for computing maxT, see the online help.
But given the low number of SNPs you are considering, I would suggest relying on FDR-based or maxT tests as implemented in the multtest R package (see
mt.maxT
), but the definitive guide to resampling strategies for genomic application is Multiple Testing Procedures with Applications to Genomics, from Dudoit & van der Laan (Springer, 2008). See also Andrea Foulkes's book on genetics with R, which is reviewed in the JSS. She has great material on multiple testing procedures.Further Notes
Many authors have pointed to the fact that simple multiple testing correcting methods such as the Bonferroni or Sidak are too stringent for adjusting the results for the individual SNPs. Moreover, neither of these methods take into account the correlation that exists between SNPs due to LD which tags the genetic variation across gene regions. Other alternative have bee proposed, like a derivative of Holm's method for multiple comparison (3), Hidden Markov Model (4), conditional or positive FDR (5) or derivative thereof (6), to name a few. So-called gap statistics or sliding window have been proved successful in some case, but you'll find a good review in (7) and (8).
I've also heard of methods that make effective use of the haplotype structure or LD, e.g. (9), but I never used them. They seem, however, more related to estimating the correlation between markers, not p-value as you meant. But in fact, you might better think in terms of the dependency structure between successive test statistics, than between correlated p-values.
References