Solved – Common statistical tests as linear models

anovacorrelationlinear modelregressiont-test

(UPDATE: I dived deeper into this and and posted the results here)

The list of named statistical tests is huge. Many of the common tests rely on inference from simple linear models, e.g. a one-sample t-test is just y = β + ε which is tested against the null model y = μ + ε i.e. that β = μ where μ is some null value – typically μ=0.

I find this to be quite a bit more instructive for teaching purposes than rote learning named models, when to use them, and their assumptions as if they had nothing to do with each other. That approach promotes does not promote understanding. However, I cannot find a good resource collecting this. I am more interested in equivalences between the underlying models rather than the method of inference from them. Although, as far as I can see, likelihood ratio tests on all these linear models yield the same results as the "classical" inference.

Here are the equivalences I've learned about so far, ignoring the error term $\varepsilon \sim \mathcal N(0, \sigma^2)$ and assuming that all null hypotheses are the absense of an effect:

One-sample t-test:
$y = \beta_0 \qquad \mathcal{H}_0: \beta_0 = 0$.

Paired-sample t-test:
$y_2-y_1 = \beta_0 \qquad \mathcal{H}_0: \beta_0 = 0$

This is identical to a one-sample t-test on pairwise differences.

Two-sample t-test:
$y = \beta_1 * x_i + \beta_0 \qquad \mathcal{H}_0: \beta_1 = 0$

where x is an indicator (0 or 1).

Pearson correlation:
$y = \beta_1 * x + \beta_0 \qquad \mathcal{H}_0: \beta_1 = 0$

Notice the similarity to a two-sample t-test which is just regression on a binary x-axis.

Spearman correlation:
$rank(y) = \beta_1 * rank(x) + \beta_0 \qquad \mathcal{H}_0: \beta_1 = 0$

This is identical to a Pearson correlation on rank-transformed x and y.

One-way ANOVA:
$y = \beta_1*x_1 + \beta_2*x_2 + \beta_3*x_3 +… \qquad \mathcal{H}_0: \beta_1, \beta_2, \beta_3, … = \beta$

where $x_i$ are indicators selecting the relevant $\beta$ (one $x$ is 1; the others are 0). The model could probably be written in matrix form as as $Y = \beta * X$.

Two-way ANOVA:
$y = \beta_1 * X_1 + \beta_2 * X_2 + \beta_3 * X_1 * X_2 \qquad \mathcal{H}_0: \beta_3 = 0$

for two two-level factors. Here $\beta_i$ are vectors of betas where one is selected by the indicator vector $X_i$. The $\mathcal{H}_0$ shown here is the interaction effect.

Could we add more "named tests" to this list of linear models? E.g., multivariate regression, other "non-parametric" tests, binomial tests, or RM-ANOVAs?

UPDATE: questions have been asked and answered about ANOVA and t-tests as linear models here on SO. See this question and tagged related questions.

Best Answer

Not an exhaustive list but if you include generalized linear models, the scope of this problem becomes substantially larger.

For instance:

The Cochran-Armitage test of trend can be formulated by: $$E[\mbox{logit} (p) | t] = \beta_0 + \beta_1 t \qquad \mathcal{H}_0: \beta_1 = 0$$

The Pearson Chi-Square test of independence for a $p \times k$ contingency table is a log-linear model for the cell frequencies given by:

$$E[\log (\mu)] = \beta_0 + \beta_{i.} + \beta_{.j} + \gamma_{ij} \quad i,j > 1 \qquad\mathcal{H}_0: \gamma_{ij} = 0, \quad i,j > 1$$

Also the t-test for unequal variances is well approximated by using the Huber White robust error estimation.

Related Question