I would like to calculate Tukey-adjusted p-values for emmeans pairwise comparisons. I know that these can be obtained directly with functions like pairs() and CLD(). However, when there are three leading zeroes in the p-value, only one digit is displayed. I recognize that in this case the significance of the test statistic is not in question, but I like to consistently report p-values with two digits in papers.
In attempting to calculate a more precise p-value based on the output of pairs(), I have not been able to figure out how to do this when there are multiple comparisons. It's straightforward when there is just one comparison:
> pairs(emmeans(model1, "harvest"), details = T)
contrast estimate SE df t.ratio p.value
Spring - Spring/Fall 0.4521333 0.1006861 15 4.491 0.0004
> 2*pt(4.491, 15, lower=FALSE)
[1] 0.0004309609
However, when there are multiple comparisons, I can't figure out how to calculate the appropriate Tukey-adjusted p-value. An unadjusted p-value is too low and an adjusted p-value is too high (using the contrast between factor levels 15 and 61 as an example).
> pairs(emmeans(model2, "row.space"))
contrast estimate SE df t.ratio p.value
15 - 30 0.1979111 0.1034653 62 1.913 0.1436
15 - 61 0.4199143 0.1034653 62 4.059 0.0004
30 - 61 0.2220032 0.1034653 62 2.146 0.0890
P value adjustment: tukey method for comparing a family of 3 estimates
> 2*pt(4.059, 62, lower=FALSE) # too low
[1] 0.0001405038
> 2*ptukey(4.059, nmeans = 3, df = 62, lower=FALSE) # too high
[1] 0.03053126
How should I calculate the Tukey-adjusted p-value so that I can obtain more digits?
Best Answer
Tukey-adjusted P values are computed using the
ptukey()
function in R (Studentized range distribution). This is one of the toughest distributions to compute, among those in common use. The help page forptukey
states:This means that the probabilities are computed using numerical integration. I really doubt that a P value way out in the tail can be computed to many digits' accuracy. I suggest that if you have a habit of reporting P values to 2 digits accuracy, it is a habit worth breaking. Nobody needs to distinguish between P = .00022 and P = .00024, even if 2 digits accuracy is believable.
However, as is suggested in a comment, if you do
summary(pairs(...))
oras.data.frame(pairs(...))
, you get something that inherits fromdata.frame
, and every P value can be extracted to 9 or so digits. But it is false precision.Also, here is the correct calculation of that P value:
Actually, the actual result is somewhere in a range depending on the rounding of $t$:
The $\sqrt{2}$ factor is needed because the standardization is based on the SE of one mean rather than the difference of two means; and the Studentized range test is a one-sided test based on the maximum minus the minimum.