P-Value – How to Express a P-Value in Terms of the T-Statistic in a One-Sample T-Test

p-valueprobabilityrt-test

I am writing out a t-test by hand and I am confused about how to mathematically express the step of calculating / looking up the $p$-value of the t-statistic. This is what I have so far from the following example:

Suppose I take a random sample of size $n=36$ (assume CLT holds) from a population with mean and standard deviation $\mu=67$ and $\sigma = 7$. Also suppose the mean of the sample is $\bar{x} = 71$. Is there sufficient statistical evidence to prove that the mean of the population is actually greater than 67?

My solution thus far is:
$$
H_0: \mu = 67 \\
H_A: \mu > 67 \\
$$

$$t = \frac{71 – 67}{\frac{7}{\sqrt{36}}} = \frac{7}{6} \approx 3.428571$$

I understand that the $p$-value can be calculated in R with 1 - pt(3.428571, df=35) resulting in 0.0007848787, but how would I express this mathematically? My best guess is something like:

$$
P(\bar{X} > t_{\text{df}=35}) \approx 0.0007848787
$$

…but I'm hesitant about this because I'm pretty sure $\bar{X}$ is on a different distribution than the t-statistic.

I suppose another way of asking this question is how to mathematically express pt(3.428571, df=35).

I've searched several tutorials for answers but because they tend to be geared more towards beginners they largely gloss over the mathematical expressions (especially for the $p$-value calculations).

This question is related in the sense that it addresses how to calculate the $p$-value for left-/right-tailed tests but it doesn't really answer my specific question of how to mathematically express the $p$-value in terms of the t-statistic.

Best Answer

Depending on the assumptions one is willing to make, the problem can be stated mathematically as follows.

Case I

Let $X_1,\ldots,X_n$ an i.i.d. sample with $X_i\sim N(\mu, \sigma^2)$, with $\mu, \sigma^2$ both unknown finite parameters. Furthermore, let $X = n^{-1}\sum_{i=1}^n X_i$ and $S^2 = (n-1)^{-1}\sum_{i=1}^n (X_i-\bar X)^2$ be respectively the sample average and the sample variance and let us be interested in testing $H_0:\mu=\mu_0$ against $H_1:\mu>\mu_0$.

It can be proved that

$$ \frac{\sqrt{n}(\bar {X}-\mu)}{\sigma^2}\sim N(0,1), $$

$$ \frac{(n-1)S^2}{\sigma^2}\sim \chi_{n-1}^2, $$ and that $\bar X$ and $S^2$ are independent. Thus

$$ T_n = \frac{\frac{\sqrt{n}(\bar {X}-\mu)}{\sigma^2}}{\sqrt{\frac{\frac{(n-1)S^2}{\sigma^2}}{n-1}}} = \frac{\sqrt{n}(\bar {X}-\mu)}{\sqrt{S^2}}\sim t_{n-1}, $$ is the usual $t$ statistic, which follows a $t$-Student distribution with $n-1$ degrees for freedom.

A test of size $\alpha$ has rejection $$R_{\alpha} = \{X_1,\ldots,X_n: T_n \geq t_{n-1,1-\alpha}\}.$$

If we denote by $T_{n}^{obs}$, the observed $t$ statistics, the $p$-value (see here) is given by $$ \sup_{\mu\leq \mu_{0}} P_\theta(T_n \geq T_{n}^{obs}) = P_{\mu_0}(t_{n-1}\geq T_{n}^{obs}). $$

Case II

If $\sigma^2$ is known then there is no need to estimate it and the statistic to be used is

$$ Z_n = \frac{\sqrt{n}(\bar X-\mu)}{\sigma}\sim N(0,1). ,$$

The test of size $\alpha$ is to reject $H_0$ if $Z_n\geq z_{1-\alpha}$ and the $p$-value is

$$ \sup_{\mu\leq \mu_{0}} P_\theta(Z_n \geq Z_{n}^{obs}) = P_{\mu_0}(Z\geq Z_{n}^{obs}). $$

Case III

Lastly, if we don't wish to assume the normality of the sample but only that the sample is iid from a population with mean $\mu$ and variance $\sigma^2$, both unknown, then we can use the statistic

$$ Z_n^{s} = \frac{\sqrt{n}(\bar X-\mu)}{S^2}, $$ which by CLT converges to $N(0,1)$; the test and the $p$-value are computed as in Case II.

Case IV

Again, we do not make the normality assumption but only that the sample is iid from a population with mean $\mu$ and variance $\sigma^2$, with $\sigma^2$ known. The test statistic is as in Case II, and its distribution $N(0,1)$ holds only in the limit, for large $n$.

Example. Your example seems to follow in Case IV, with $n=36$, $\bar x = 71$, $\mu_0=67$ and $\sigma^2 = 7^2$. Thus the observed test statistic is

$$ \frac{\sqrt{36}(71-67)}{7}=3.428571 $$

and $p$-value = $P(Z\geq 3.428571) = 0.000303$.

Nevertheless, to have less conservative results, in Case III and Case IV, some statisticians prefer to use the $t_{n-1}$ distribution in place of the limiting $N(0,1)$ distribution.

Related Question