Solved – Tracking down the assumptions made by SciPy’s ttest_ind() function

pythonstatistical significancet-test

I'm trying to write my own Python code to compute t-statistics and p-values for one and two tailed independent t tests. I can use the normal approximation, but for the moment I am trying to just use the t-distribution. I've been unsuccessful in matching the results of SciPy's stats library on my test data. I could use a fresh pair of eyes to see if I'm just making a dumb mistake somewhere.

Note, this isn't so much of a coding question as it is a "why isn't this computation yielding the right t-stat?" I give the code for completeness, but don't expect any software advice. Just help understanding why this isn't right.

My code:

import numpy as np
import scipy.stats as st

def compute_t_stat(pop1,pop2):

    num1 = pop1.shape[0]; num2 = pop2.shape[0];

    # The formula for t-stat when population variances differ.
    t_stat = (np.mean(pop1) - np.mean(pop2))/np.sqrt( np.var(pop1)/num1 + np.var(pop2)/num2 )

    # ADDED: The Welch-Satterthwaite degrees of freedom.
    df = ((np.var(pop1)/num1 + np.var(pop2)/num2)**(2.0))/(   (np.var(pop1)/num1)**(2.0)/(num1-1) +  (np.var(pop2)/num2)**(2.0)/(num2-1) ) 

    # Am I computing this wrong?
    # It should just come from the CDF like this, right?
    # The extra parameter is the degrees of freedom.

    one_tailed_p_value = 1.0 - st.t.cdf(t_stat,df)
    two_tailed_p_value = 1.0 - ( st.t.cdf(np.abs(t_stat),df) - st.t.cdf(-np.abs(t_stat),df) )    


    # Computing with SciPy's built-ins
    # My results don't match theirs.
    t_ind, p_ind = st.ttest_ind(pop1, pop2)

    return t_stat, one_tailed_p_value, two_tailed_p_value, t_ind, p_ind

Update:

After reading a bit more on the Welch's t-test, I saw that I should be using the Welch-Satterthwaite formula to calculate degrees of freedom. I updated the code above to reflect this.

With the new degrees of freedom, I get a closer result. My two-sided p-value is off by about 0.008 from the SciPy version's… but this is still much too big an error so I must still be doing something incorrect (or SciPy distribution functions are very bad, but it's hard to believe they are only accurate to 2 decimal places).

Second update:

While continuing to try things, I thought maybe SciPy's version automatically computes the Normal approximation to the t-distribution when the degrees of freedom are high enough (roughly > 30). So I re-ran my code using the Normal distribution instead, and the computed results are actually further away from SciPy's than when I use the t-distribution.

Best Answer

By using the SciPy built-in function source(), I could see a printout of the source code for the function ttest_ind(). Based on the source code, the SciPy built-in is performing the t-test assuming that the variances of the two samples are equal. It is not using the Welch-Satterthwaite degrees of freedom.

I just want to point out that, crucially, this is why you should not just trust library functions. In my case, I actually do need the t-test for populations of unequal variances, and the degrees of freedom adjustment might matter for some of the smaller data sets I will run this on. SciPy assumes equal variances but does not state this assumption.

As I mentioned in some comments, the discrepancy between my code and SciPy's is about 0.008 for sample sizes between 30 and 400, and then slowly goes to zero for larger sample sizes. This is an effect of the extra (1/n1 + 1/n2) term in the equal-variances t-statistic denominator. Accuracy-wise, this is pretty important, especially for small sample sizes. It definitely confirms to me that I need to write my own function. (Possibly there are other, better Python libraries, but this at least should be known. Frankly, it's surprising this isn't anywhere up front and center in the SciPy documentation for ttest_ind()).