Solved – Reporting degrees of freedom for Welch t-test

degrees of freedomreportingt-test

The Welch t-test for unequal variances (also known as Welch–Satterthwaite or Welch-Aspin) generally has a non-integer degrees of freedom. How should these degrees of freedom be quoted when reporting the results of the test?

"It is conventional to round down to the nearest integer before consulting standard t tables" according to various sources* – which makes sense as this direction of rounding is conservative.** Some older statistical software would do this too (e.g. Graphpad Prism before version 6) and some online calculators still do. If this procedure had been used, reporting the rounded-down degrees of freedom seems appropriate. (Though using some better software might be even more appropriate!)

But the vast majority of modern packages make use of the fractional part so in this case it seems the fractional part should be quoted. I can't see it being appropriate to quote to more than two decimal places, as a thousandth of a degree of freedom would only have negligible impact on the p-value.

Looking around Google scholar, I can see papers quoting the df as a whole number, with one decimal place, or with two decimal places. Are there any guidelines on how much accuracy to use? Also, if the software used the full fractional part, should the quoted df be rounded down to the desired number of figures (e.g. $7.5845… \rightarrow 7.5$ to 1 d.p. or $\rightarrow 7$ as a whole number) as was appropriate with the conservative calculation, or as seems more sensible to me, rounded conventionally (to the nearest) so that $7.5845… \rightarrow 7.6$ to 1 d.p. or $\rightarrow 8$ to the nearest whole?

Edit: aside from knowing the most theoretically sound way of reporting non-integer df, it would also be good to know what people do in practice. Presumably journals and style guides have their own requirements. I would be curious what influential style guides like the APA require. From what I can discern (their manual is not freely available online), the APA have a general preference that almost everything should appear to two decimal places, except p-values (which may be two or three d.p.) and percentages (rounded to the nearest percent) – that covers regression slopes, t statistics, F statistics, $\chi^2$ statistics and so on. This is quite illogical, bearing in mind that the second decimal place occupies a very different significant figure, and suggests quite different precision, in 2.47 than in 982.47, but might explain the number of Welch df with two decimal places I saw in my unscientific sample.

$*$ e.g. Ruxton, G.D. The unequal variance t-test is an underused alternative to Student's t-test and the Mann–Whitney U test, Behavioral Ecology (July/August 2006) 17 (4): 688-690 doi:10.1093/beheco/ark016

$**$ Though the Welch-Satterthwaite approximation itself may or may not be conservative, and in a case where it is not conservative, rounding down the degrees of freedom is no guarantee of compensating overall.

Best Answer

I have not studied actual practice, so this reply cannot address that aspect of the question. As a general principle I would expect the treatment of significant digits in reporting the degrees of freedom (df) to be based on judgment related to significant figures.

The principle is to be consistent: use the precision in one quantity that is appropriate for the precision used in another one that is related to it. Specifically, when reporting values $x$ and $y=f(x)$ when $x$ is given to the nearest multiple of a small value $h$ (such as $h=\frac{1}{2}\times 10^{-6}$ for six places after the decimal point), the relative precision in $y$ as mediated by the function $f$ is

$$\sup_{-h \le k \le h} |f(x+k) - f(x)| \approx h | \frac{d}{dx} f(x) |.$$

The approximation applies when $f$ is continuously differentiable on the interval $[x-h, x+h]$.

In the present application, $y$ is the $p$-value, $x$ is the degrees of freedom $\nu$, and

$$y = f(x) = f(\nu) = F_\nu(t)$$

where $t$ is the Welch-Satterthwaite statistic and $F_\nu$ is the CDF of the Student $t$ distribution with $\nu$ degrees of freedom.

For relatively high df $\nu$, often a change in the first decimal place would not change the p-value at all (to the level of precision reported), so rounding to an integer is fine ($h=1/2$ but $h|\frac{d}{dx}f(x)|$ is very small). For very low df and extreme values of the statistic $t$, the magnitude of the derivative $|\frac{\partial}{\partial\nu}F_\nu(t)|$ can exceed $0.01$, suggesting in such cases that $\nu$ should be reported to only one less decimal place than $p$ itself.

See for yourself with this labeled contour plot of the magnitude of the derivative for the lowest (reasonable) df and ranges of $|t|$ that would be of interest (because they can lead to low p-values).

Figure

The labels show the base-10 logarithm of the derivative. Thus, at points between $-k$ and $-(k+1)$ on this plot, changing the reported df in the $j^\text{th}$ place after the decimal point will likely change the reported p-value only in the $(j+k)^\text{th}$ and later places. For example, suppose you are rounding the p-value to $10^{-6}$ (six decimal places). Consider the statistics $\nu=2.5$ and $t=8$. These are located near the $-3$ log contour. Therefore, $\nu$ should be reported to $6+(-3)=3$ decimal places.

The light blue areas, for the largest $k$, are the ones of concern, because they show where small changes in $\nu$ have the greatest effects on the p-value.

Contrast this with the situation for higher df (from $4$ to $30$ shown):

Figure 2

The influence of $\nu$ on the precision of $p$ quickly wanes as $\nu$ increases.

Related Question