Solved – Why is the degrees of freedom for a matched pairs $t$-test the number of pairs minus 1

degrees of freedomt-test

I am used to knowing "degrees of freedom" as $n – r$, where you have the linear model $$\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon}$$
with $\mathbf{y} \in \mathbb{R}^n$, $\mathbf{X} \in M_{n \times p}(\mathbb{R})$ the design matrix with rank $r$, $\boldsymbol{\beta} \in \mathbb{R}^p$, $\boldsymbol{\epsilon} \in \mathbb{R}^n$ with $\boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \sigma^2 \mathbf{I}_n)$, $\sigma^2 > 0$.

From what I recall of elementary statistics (i.e., pre-linear models with linear algebra), the degrees of freedom for the matched-pairs $t$-test is the number of differences minus $1$. So this would entail $\mathbf{X}$ having rank 1, perhaps. Is this correct? If not, why is $n-1$ the degrees of freedom for the matched-pairs $t$-test?

To understand the context, suppose I have a mixed-effects model
$$y_{ijk} = \mu_i + \text{ some random effects} + e_{ijk}$$
where $i = 1, 2$, $j = 1, \dots, 8$, and $k = 1, 2$. There is nothing special about $\mu_i$ other than that it's a fixed effect, and $e_{ijk} \overset{iid}{\sim}\mathcal{N}(0, \sigma^2_e)$. I'm assuming that the random effects are irrelevant to this problem, since we only care about the fixed effects in this case.

I would like to provide a confidence interval for $\mu_1 – \mu_2$.

I have already shown that $\bar{d}_\cdot = \dfrac{1}{8}\sum d_j$ is an unbiased estimator of $\mu_1 – \mu_2$, where $d_j = \bar{y}_{1j\cdot} – \bar{y}_{2j\cdot}$, $\bar{y}_{1j\cdot} = \dfrac{1}{2}\sum_{k}y_{1jk}$, and $\bar{y}_{21\cdot}$ is defined similarly. The point estimate $\bar{d}_{\cdot}$ has been computed.

I have already shown that $$s^2_d = \dfrac{\sum_{j}(d_j – \bar{d}_{\cdot})^2}{8-1}$$
is an unbiased estimator of the variance of $d_j$, and thus,
$$\sqrt{\dfrac{s^2_d}{8}}$$
is the standard error of $\bar{d}_{\cdot}$. This has been computed.

Now the last part is figuring out the degrees of freedom. For this step, I usually try to find the design matrix – which obviously has rank 2 – but I have the solution to this problem, and it says that the degrees of freedom is $8-1$.

In the context of finding the rank of a design matrix, why are the degrees of freedom $8-1$?

Edited to add: Perhaps helpful in this discussion is how the test statistic is defined. Suppose I have a parameter vector $\boldsymbol{\beta}$. In this case, $$\boldsymbol{\beta} = \begin{bmatrix}
\mu_1 \\
\mu_2
\end{bmatrix}$$
(unless I'm missing something entirely). We are essentially performing the hypothesis test $$\mathbf{c}^{\prime}\boldsymbol{\beta} = 0$$
where $\mathbf{c}^{\prime} = \begin{bmatrix}
1 & -1
\end{bmatrix}$. Then, the test statistic is given by
$$t = \dfrac{c^{\prime}\hat{\boldsymbol{\beta}}}{\sqrt{\hat{\sigma}^2c^{\prime}(\mathbf{X}^{\prime}\mathbf{X})^{-1}\mathbf{c}}}$$
which would be tested against a central $t$-distribution with $n – r$ degrees of freedom, where $\mathbf{X}$ is the design matrix as above, and
$$\hat{\sigma}^2 = \dfrac{\mathbf{y}^{\prime}(\mathbf{I}-\mathbf{P}_{\mathbf{X}})\mathbf{y}}{n-r}$$
where $\mathbf{P}_{\mathbf{X}} = \mathbf{X}(\mathbf{X}^{\prime}\mathbf{X})^{-1}\mathbf{X}^{\prime}$.

Best Answer

The matched-pairs $t$-test with $n$ pairs is actually just a one-sample $t$-test with a sample of size $n$. You have $n$ differences $d_1,\ldots,d_n$, and these are i.i.d. and normally distributed. $$ \begin{array}{ccccc} \begin{bmatrix} d_1 \\ \vdots \\ d_n \end{bmatrix} & = & \begin{bmatrix} \bar d \\ \vdots \\ \bar d \end{bmatrix} & + & \begin{bmatrix} d_1 - \bar d \\ \vdots \\ d_1 - \bar d \end{bmatrix} \\[10pt] n \text{ d.f.} & & 1 \text{ d.f.} & & (n-1) \text{ d.f.} \end{array} $$ The first column after $\text{“}{=}\text{''}$ has $1$ degree of freedom because of the linear constraint that says all entries are equal; the second has $n-1$ degrees of freedom because of the linear constraint that says the sum of the entries is $0$.

Related Question