Solved – Using SciPy T.ppf to get p-value

hypothesis testingpythonscipy

Trying to test code creating P-value manually against SciPy. The Scipy Documentation isn't the best, which makes it tought to know for sure what to do.

I am getting the correct t-stat and P-value with SciPy, but I'm not able to replicate the correct p-value manually – A friend steered me to scipy.stats.t.ppf – but I'm not getting a p-value from it.

What is the correct way to do scipy.stats.t.ppf()?

my version:

def t_test(sample, mu):
    mean = np.mean(sample)
    var = np.var(sample)
    sem = (var / len(sample)) ** .5
    t = abs(mu - mean)/sem
    df = len(sample) - 1
    p = scs.t.ppf(.95, df)
    return (t, p)

returns (0.081500599630942958, 1.7291328115213671)

scipy version:
scs.ttest_1samp(sample, 4.123)
returns (statistic=0.079436958358141435, pvalue=0.93751577779749051)

for testing, I'm using the following sample set and sample mu.

sample = [4.15848606,  3.86146363,  4.31545726,  3.3748772,
          4.67023082,  4.45950272,  3.85894915,  4.41089417,
          3.82360986,  3.79889443,  4.75884172,  3.27100914,
          4.08939402,  4.08904694,  5.62589842,  3.71445656,
          3.58463792,  4.42426443,  3.9671448 ,  4.39339124]

mu = 4.123

Best Answer

To get the same results, change two things:

Change the estimation of the variance such that the divisor is N-1
Calculate the p-value using the cdf, that is the probability of getting a more extreme value, here using that the t-distribution is symmetric around zero. Note that the function you're comparing with does a two-sided test, and therefore, so do I.

I've marked the relevant lines with ###. The result is now the same as from the ttest_1samp function.

def t_test(sample, mu):
mean = np.mean(sample)
var = np.var(sample, ddof = 1) ###
sem = (var / len(sample)) ** .5
t = abs(mu - mean)/sem
df = len(sample) - 1
p = 2*(1-scs.t.cdf(t, df)) ###
return (t, p)

Related Solutions

Solved – Beta distribution fitting in Scipy

Despite an apparent lack of documentation on the output of beta.fit, it does output in the following order:

$\alpha$, $\beta$, loc (lower limit), scale (upper limit - lower limit)

Solved – Skew of log-normal distribution using sciPy

According to the skew of the resulting distribution, I would like to make a 0-1 decision, i.e. if skew is positive, give a "0" value, as most of the data is on the left, while, if skew is negative, give a "1" value, as most of the data is on the right.

As a general statement, this is not true. It's often the case, but it's trivial to find counterexamples (and unfortunately many elementary books insist on making statements almost exactly like yours, to eventual substantial confusion when a real-world counterexample appears).

In your question, you're using third-moment skewness as your skewness measure, but statements that 'most of the data is on the left'/'most of the data is on the right' relates to a different kind of skewness, the second Pearson skewness coefficient.

The two measures can disagree about the sign of skewness in a population.

Consider, for example, a Poisson with mean 0.7. It has third moment skewness of about 1.195, but it has second Pearson skewness (/mean-median skewness) of $3(0.7-1)/\sqrt 0.7$, or about -1.076 (that is, in spite of having positive third moment skewness, more of its probability is above the mean than below it). [That's a simple example I only came up with a couple of days ago in response to another question; it only took a few minutes to come up with several such examples, but that one's my favourite of the counterexamples, not least because "Poisson(0.7)" is so simple to picture, so easy to remember and unambiguously convey.]

From Wikipedia, the skew is equal to: $(e^{σ^2}+2)\sqrt{e^{σ^2}−1}$, but this can never be a negative number. Using sciPy's moment method like this: scipy.stats.moment(data,3)/(std**3)

You're confusing two different things! The first is a population quantity and the second is a sample quantity.

You can have one positive while the other is negative, just by chance.

Here's an example. This is a sample I just generated from a lognormal distribution ($n=30$, with parameters $\mu=0$ and $\sigma=0.1$), which happens to have negative sample skewness:

 0.139  0.086  0.046 -0.084  0.020  0.050 -0.041  0.080 -0.048  0.076 -0.050  0.023  
 0.063 -0.210  0.011 -0.343 -0.016 -0.005  0.123  0.044 -0.026  0.048  0.107  0.066  
 0.089 -0.047  0.175 -0.092 -0.095  0.020

In fact with those parameters and that sample size, the sample third moment is negative almost half the time.

Best Answer

Related Solutions

Solved – Beta distribution fitting in Scipy

Solved – Skew of log-normal distribution using sciPy

Related Question