Solved – How is the standard error used in calculating a Tukey HSD defined and calculated

standard errorstatistical significancetreatment-effecttukey-hsd-test

I am looking at someone else’s published data where they report treatment means and then a single standard error value across treatment means.

Sanderson et al. (2018)

I want to know if certain treatments are significantly different from each other. For example, I’ve highlighted a comparison I am interested in in green.

I am thinking that I might be able to use the Tukey HSD test, but I am not sure because I am not sure how standard error is defined and used in the Tukey HSD.

I am looking at an example from the following textbook by Robert O. Kuehl: “Design of Experiments: Statistical Principles of Research Design and Analysis. 2nd Edition.”

In the example (p. 107-109), you calculate the HSD by multiplying the Studentized range statistic (q) by the standard error.

Here’s my confusion. On page 107 it says the standard error is “the standard error of a treatment mean” which, to me, evokes the idea of a unique standard error for each treatment mean calculated from that treatments individual replications.

That understanding of it doesn’t reconcile well with the idea that the HSD is going to be the same across treatments comparisons. On page 109 there’s an actual numeric example, and the standard error used is constant no matter what means are being compared. That suggests that the standard error is calculated from all the treatments, but I am not sure how.

What does the standard error used in the Tukey HSD calculation refer to and how is it calculated?

Also, is it safe to assume that the standard error reported in the table above (highlighted in blue) is the standard error you would use in a Tukey HSD calculation? There's no additional information in the text of the article further defining what is meant by SE in the table.

Best Answer

Here’s my confusion. On page 107 it says the standard error is “the standard error of a treatment mean” which, to me, evokes the idea of a unique standard error for each treatment mean calculated from that treatments individual replications.

Statistical analysis often involves pooling information among observations to get more reliable estimates of parameter values. In an analysis of variance like what you show in your question, you do not use calculations "from that treatment's individual replications" to get standard errors. Rather, based on an assumption that the underlying variance of observations is the same among all treatments, you use all the replications on all treatments to get an overall estimate of the variance. For each treatment you estimate the variance around that treatment's mean value, and then pool those estimates from all treatments.

If all treatments had the same number of replicates, then the standard errors of the mean values are the same for all treatments. The use of "SE" by the authors of that table suggests that the values reported are for those standard errors of the mean values, taking the number of replicates into account.

For t-tests in general when testing the difference of a value from 0, the statistic you calculate is the ratio of that value to the standard error of that value. For Tukey's range test, instead of evaluating the statistic against a t distribution with the appropriate number of degrees of freedom you evaluate it against a studentized range distribution that also takes into account the number of values that are under consideration for testing. This provides a correction for multiple comparisons.

So Tukey's test uses a t-test type of statistic that is the ratio of the difference between two mean values divided by the standard error of the difference of those two means. Assuming independence and the same individual standard errors, that would be $\sqrt 2$ times the standard error of an individual mean.* You would have to check the software you are using to see whether it expects from you the standard error of an individual mean or the standard error of a difference of two means.

That said, the Tukey test might not be the best way to accomplish what you want. It's appropriate if you have several treatments (with a common estimated standard error) that you want to compare in a way that corrects for multiple comparisons. If you have a single pre-specified comparison in mind, not developed based on looking at the data, then you don't have to correct for multiple comparisons. Note in particular that if a difference isn't significant without that correction, then it certainly won't be significant after correction. In the particular comparison you highlight, the difference of 567 in treatment means is only 1.55 standard errors and thus would not pass the standard test of statistical significance at p < 0.05.


*The Wikipedia web page I linked says it's the standard error of the sum of the values, but for uncorrelated variables that's the same as the standard error of their difference. One situation in which you do have to deal with treatments individually is if there are different numbers of replicates for the two treatments being compared, not addressed directly in the Wikipedia page.