Solved – Difference between t-test and ANOVA in linear regression

anovaregressiont-test

I wonder what differences are between t-test and ANOVA in linear regression?

Is a t-test to test whether any one of the slopes and intercept has
mean zero, while ANOVA to test whether all slopes have mean zero? Is this the only difference between them?
In simple linear regression i.e. where there is only one predictor
variable, there is only one slope to estimate. So are t-test and
ANOVA equivalent, and if yes, how, given that they are using different statistics (t-test is using t-statistic and ANOVA is using F-statistic)?

Best Answer

The general linear model lets us write an ANOVA model as a regression model. Let's assume we have two groups with two observations each, i.e., four observations in a vector $y$. Then the original, overparametrized model is $E(y) = X^{\star} \beta^{\star}$, where $X^{\star}$ is the matrix of predictors, i.e., dummy-coded indicator variables: $$ \left(\begin{array}{c}\mu_{1} \\ \mu_{1} \\ \mu_{2} \\ \mu_{2}\end{array}\right) = \left(\begin{array}{ccc}1 & 1 & 0 \\ 1 & 1 & 0 \\ 1 & 0 & 1 \\ 1 & 0 & 1\end{array}\right) \left(\begin{array}{c}\beta_{0}^{\star} \\ \beta_{1}^{\star} \\ \beta_{2}^{\star}\end{array}\right) $$

The parameters are not identifiable as $((X^{\star})' X^{\star})^{-1} (X^{\star})' E(y)$ because $X^{\star}$ has rank 2 ($(X^{\star})'X^{\star}$ is not invertible). To change that, we introduce the constraint $\beta_{1}^{\star} = 0$ (treatment contrasts), which gives us the new model $E(y) = X \beta$: $$ \left(\begin{array}{c}\mu_{1} \\ \mu_{1} \\ \mu_{2} \\ \mu_{2}\end{array}\right) = \left(\begin{array}{cc}1 & 0 \\ 1 & 0 \\ 1 & 1 \\ 1 & 1\end{array}\right) \left(\begin{array}{c}\beta_{0} \\ \beta_{2}\end{array}\right) $$

So $\mu_{1} = \beta_{0}$, i.e., $\beta_{0}$ takes on the meaning of the expected value from our reference category (group 1). $\mu_{2} = \beta_{0} + \beta_{2}$, i.e., $\beta_{2}$ takes on the meaning of the difference $\mu_{2} - \mu_{1}$ to the reference category. Since with two groups, there is just one parameter associated with the group effect, the ANOVA null hypothesis (all group effect parameters are 0) is the same as the regression weight null hypothesis (the slope parameter is 0).

A $t$-test in the general linear model tests a linear combination $\psi = \sum c_{j} \beta_{j}$ of the parameters against a hypothesized value $\psi_{0}$ under the null hypothesis. Choosing $c = (0, 1)'$, we can thus test the hypothesis that $\beta_{2} = 0$ (the usual test for the slope parameter), i.e. here, $\mu_{2} - \mu_{1} = 0$. The estimator is $\hat{\psi} = \sum c_{j} \hat{\beta}_{j}$, where $\hat{\beta} = (X'X)^{-1} X' y$ are the OLS estimates for the parameters. The general test statistic for such $\psi$ is: $$ t = \frac{\hat{\psi} - \psi_{0}}{\hat{\sigma} \sqrt{c' (X'X)^{-1} c}} $$

$\hat{\sigma}^{2} = \|e\|^{2} / (n-\mathrm{Rank}(X))$ is an unbiased estimator for the error variance, where $\|e\|^{2}$ is the sum of the squared residuals. In the case of two groups $\mathrm{Rank}(X) = 2$, $(X'X)^{-1} X' = \left(\begin{smallmatrix}.5 & .5 & 0 & 0 \\-.5 & -.5 & .5 & .5\end{smallmatrix}\right)$, and the estimators thus are $\hat{\beta}_{0} = 0.5 y_{1} + 0.5 y_{2} = M_{1}$ and $\hat{\beta}_{2} = -0.5 y_{1} - 0.5 y_{2} + 0.5 y_{3} + 0.5 y_{4} = M_{2} - M_{1}$. With $c' (X'X)^{-1} c$ being 1 in our case, the test statistic becomes: $$ t = \frac{M_{2} - M_{1} - 0}{\hat{\sigma}} = \frac{M_{2} - M_{1}}{\sqrt{\|e\|^{2} / (n-2)}} $$

$t$ is $t$-distributed with $n - \mathrm{Rank}(X)$ df (here $n-2$). When you square $t$, you get $\frac{(M_{2} - M_{1})^{2} / 1}{\|e\|^{2} / (n-2)} = \frac{SS_{b} / df_{b}}{SS_{w} / df_{w}} = F$, the test statistic from the ANOVA $F$-test for two groups ($b$ for between, $w$ for within groups) which follows an $F$-distribution with 1 and $n - \mathrm{Rank}(X)$ df.

With more than two groups, the ANOVA hypothesis (all $\beta_{j}$ are simultaneously 0, with $1 \leq j$) refers to more than one parameter and cannot be expressed as a linear combination $\psi$, so then the tests are not equivalent.

Best Answer

Related Solutions

ANOVA vs Linear Regression – Understanding Differences in Research Methodology

Solved – What distributions are for the slope and for the intercept in linear regression

Related Question