Solved – Bonferroni bound and FDR: compute p-values

bonferronip-valuer

Suppose I have measured the activity of $N=20$ genes for 102 subjects (50 with cancer and 52 normal). Based on the available data, I compute the following table:

| 1    | 2    | ... | 20
|-0.66 | 1.02 | ... | 0.33

The first line reports just the index of the gene and the second one is the computed t-value, where:

$$
t_i = \frac{\bar{x}_{i, cancer} – \bar{x}_{i,normal}}{s_i \sqrt{1/50+1/52}}.
$$

I know that $t_i \sim t(100)$ and, based on my notes:

$$
p_i = P(|t(100)| > |t_i|).
$$

Suppose also that $\alpha = 0.05$, then we reject the null hypothesis if $p_i < \frac{\alpha}{N}$.

I have some doubts about how to compute the p-values given the t-statistics. I guess this is a two-sided test and therefore to compute the p-value I have to use something like (in R):

2 * pt(abs(p1), df = 100, lower.tail = FALSE),

where p1 is for instance -0.66. Is this correct? These p-values I can use then for the Benjamini-Hochberg rule (just sorted), correct?

Best Answer

Your formula for the p-values appears correct, assuming p1 is the t-value.

I don't see any reason why you would use both the Benjamini-Hochberg correction to control false discovery rate (FDR) and the Bonferroni correction to control the familywise error rate (FWER). You would choose one approach or the other.

Corrections for multiple p-values can be handled in R with the p.adjust function.

When using this function, the decision rule remains p < alpha [not p < alpha / n]. That is, R adjusts the p-values for you so that you don't need to adjust the decision rule.

The following code in R calculates the p-value for 7 genes, then uses either BH or Bonferroni correction. The S columns in the data frame indicate whether the p-value is < 0.05.

You'll note that Bonferroni is more conservative than BH. I think that Bonferroni is too conservative for most situations. It is helpful to read up on the various FDR and FWER control methods.

Gene = 1:7
t.values = c(-0.66, 1.02, 3.2, 2.7, 1.1, 2.5, 0.33)
p.values = 2 * pt(abs(t.values), df = 100, lower.tail = FALSE)

p.BH = p.adjust(p.values, method="BH")
p.B = p.adjust(p.values, method="bonferroni")

### Make things pretty ###

p.values = round(p.values, 3)
p.BH = round(p.BH, 3)
p.B = round(p.B, 3)

S.BH = p.BH < 0.05
S.B = p.B < 0.05

Data = data.frame(Gene, t.values, p.values, p.BH, S.BH, p.B, S.B)

Data

###  Gene t.values p.values  p.BH  S.BH   p.B   S.B
###     1    -0.66    0.511 0.596 FALSE 1.000 FALSE
###     2     1.02    0.310 0.434 FALSE 1.000 FALSE
###     3     3.20    0.002 0.013  TRUE 0.013  TRUE
###     4     2.70    0.008 0.029  TRUE 0.057 FALSE
###     5     1.10    0.274 0.434 FALSE 1.000 FALSE
###     6     2.50    0.014 0.033  TRUE 0.098 FALSE
###     7     0.33    0.742 0.742 FALSE 1.000 FALSE
Related Question