Solved – Taking a t-test with Python

pythont-test

How do I perform a t-test in Python?

#*-* coding: utf-8 *-*
from __future__ import division
from scipy import stats
import numpy as np

x=([ 0.01082045  0.00225922  0.00022592  0.00011891  0.00525565  0.00156956])
y=([ 0.0096333  0.0019453  0.0038384  0.0058286  0.00078786])
ttest=stats.ttest_ind(x,y)
print 't-test independent', ttest

I get two numbers out: t-test independent (array(1.5061708454111165), 0.1662878376677496)

I don't know what they mean. Could you maybe help me?

I have one row of samples. And this row of samples had to be analyzed with two different measure methods. I had to make a t-test on the two data sets afterwards. I chose an independent t-test.

But I have trouble to figure out what the two numbers I got out actually means? According to Wikipedia I just have to get a t-value out. According to Python I get a p-value out? Please help me! Is it the first number I get out I should use? And what does it tell me?

Thank you very much for help!

Best Answer

According to the manual page you point to the two numbers it returns are the t-value (t-statistic) and the p-value.

The t-statistic is how many standard errors of the difference the two means are apart. The p-value is the probability of seeing a t-statistic at least that far from 0 if the null hypothesis were true. Low p-values lead to rejection of the null hypothesis. Search our site for numerous discussions of the meaning of p-values and how to interpret them.

I am concerned, however, that your description sounds like you have a paired design (applying two different measure methods to the same 'row of samples'). How many samples did you have in your row, and why do you have 5 in one set and 6 in the other?

However, just looking at the results you got in Python, I can't reproduce its answers in R. Aside from the sign of the t-statistic (which depends on which order it uses in its numerator) what you have should agree with this:

> x=scan()
1: 0.01082045  0.00225922  0.00022592  0.00011891  0.00525565  0.00156956
7: 
Read 6 items
> y=scan()
1: 0.0096333  0.0019453  0.0038384  0.0058286  0.00078786
6: 
Read 5 items
> t.test(x,y,var.equal=TRUE)

        Two Sample t-test

data:  x and y
t = -0.44355, df = 9, p-value = 0.6678
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.006293793  0.004230313
sample estimates:
  mean of x   mean of y 
0.003374952 0.004406692

It's not clear to me why they would differ. I doubt that the Python library is wrong; it would have several tests in place that would detect simple coding errors and there's nothing remotely weird about your data that might trip it up. That suggests that it's possibly not seeing the data the way you think it is.