Solved – What do you expect a 2-sample t-test for the same two samples to return

hypothesis testingp-valuepythonscipy

I am using scipy in Python and the following return a nan value for whatever reason:

>>>stats.ttest_ind([1, 1], [1, 1])
Ttest_indResult(statistic=nan, pvalue=nan)

>>>stats.ttest_ind([1, 1], [1, 1, 1])
Ttest_indResult(statistic=nan, pvalue=nan).

But whenever I use samples that have different summary statistics, I actually get a reasonable value:

stats.ttest_ind([1, 1], [1, 1, 1, 2])
Ttest_indResult(statistic=-0.66666666666666663, pvalue=0.54146973927558495).

Is it reasonable to interpret a p-value of nan as 0 instead? Is there any reason from statistics that it doesn't make sense to run a 2-sample t-test on samples with the same summary statistics?

Best Answer

The problem with trying to compare two constant samples with a t-test is that the calculation of t involves an estimate of within-group SD in its denominator. From Wikipedia:

$$t = \frac{\bar {X}_1 - \bar{X}_2}{s_{X_1X_2} \cdot \sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}$$

When both samples are constant, $s_{X_1X_2} = 0$, leading to a division by 0.