Solved – What do you expect a 2-sample t-test for the same two samples to return

hypothesis testingp-valuepythonscipy

I am using scipy in Python and the following return a nan value for whatever reason:

>>>stats.ttest_ind([1, 1], [1, 1])
Ttest_indResult(statistic=nan, pvalue=nan)

>>>stats.ttest_ind([1, 1], [1, 1, 1])
Ttest_indResult(statistic=nan, pvalue=nan).

But whenever I use samples that have different summary statistics, I actually get a reasonable value:

stats.ttest_ind([1, 1], [1, 1, 1, 2])
Ttest_indResult(statistic=-0.66666666666666663, pvalue=0.54146973927558495).

Is it reasonable to interpret a p-value of nan as 0 instead? Is there any reason from statistics that it doesn't make sense to run a 2-sample t-test on samples with the same summary statistics?

Best Answer

The problem with trying to compare two constant samples with a t-test is that the calculation of t involves an estimate of within-group SD in its denominator. From Wikipedia:

$$t = \frac{\bar {X}_1 - \bar{X}_2}{s_{X_1X_2} \cdot \sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}$$

When both samples are constant, $s_{X_1X_2} = 0$, leading to a division by 0.

Related Solutions

Solved – Scipy.stats.anderson_ksamp negative return values for test statistic

print out X and compare it to (code blocks not formatting right, so I'm not using them):

Y = [iris.data[:,0],iris.data[:,1],iris.data[:,2],iris.data[:,3]]

You'll see that X is a 150x1 array of float64, and Y is a 4-element list.

The input given to your test is different. If you run the AD-ksample test with Y instead of X, you get a very high statistic (205.9) against critical values of .49, 1.3, 1.9, 2.5, and 3.2.

That is to say, you can with great confidence reject the null hypothesis (NH = "all four columns of data are from the same distribution").

Solved – Implementing chi-square in python and testing on scipy’s poisson and norm variates

With a large sample size like 10000, the law of large numbers holds already pretty well and the distribution of test statistics should be a good approximation. So the rejection rate in repeated simulations should be close to alpha.

If it doesn't work out that way, then it is most likely a bug in the program. The numpy/scipy random numbers have been verified years ago by a very similar chisquare test and should not have any bugs. Bugs with just tiny incorrect numbers or bugs in corner or extreme cases are never ruled out. But there are no bugs that would be visible with this kind of chisquare test.

Best Answer

Related Solutions

Solved – Scipy.stats.anderson_ksamp negative return values for test statistic

Solved – Implementing chi-square in python and testing on scipy’s poisson and norm variates

Related Question