As an economist, the analysis of variance (ANOVA) is taught and usually understood in relation to linear regression (e.g. in Arthur Goldberger's A Course in Econometrics). Economists/Econometricians typically view ANOVA as uninteresting and prefer to move straight to regression models. From the perspective of linear (or even generalised linear) models, ANOVA assigns coefficients into batches, with each batch corresponding to a "source of variation" in ANOVA terminology.
Generally you can replicate the inferences you would obtain from ANOVA using regression but not always OLS regression. Multilevel models are needed for analysing hierarchical data structures such as "split-plot designs," where between-group effects are compared to group-level errors, and within-group effects are compared to data-level errors. Gelman's paper [1] goes into great detail about this problem and effectively argues that ANOVA is an important statistical tool that should still be taught for it's own sake.
In particular Gelman argues that ANOVA is a way of understanding and structuring multilevel models. Therefore ANOVA is not an alternative to regression but as a tool for summarizing complex high-dimensional inferences and for exploratory data analysis.
Gelman is a well-respected statistician and some credence should be given to his view. However, almost all of the empirical work that I do would be equally well served by linear regression and so I firmly fall into the camp of viewing it as a little bit pointless. Some disciplines with complex study designs (e.g. psychology) may find ANOVA useful.
[1] Gelman, A. (2005). Analysis of variance: why it is more important than ever (with discussion). Annals of Statistics 33, 1–53. doi:10.1214/009053604000001048
(edited)
The slope and intercept t tests are distributed as central random variables when the 'true' slope and intercept are zero and the error terms are normal, independent and homoskedastic (there's probably a few assumptions I missed). They are distributed as non-central t random variables when the true values are non-zero (and the errors are normal, independent, etc). As pointed out in the comments, the regression coefficients are normally distributed under these assumptions.
Generally for large sample size, the (non-central) t is a very good approximation for the distribution of the regression test statistics. You can test the robustness of the approximation yourself by Monte Carlo simulation. Simply create random data (according to some model), perform linear regression, record the t-test statistics (as well as the slope and intercept), and repeat. Then look at the distribution of regression test statistics. If your generating model violates the assumptions of independence, homoskedasticity, normality, you will see how robust the approximation by t distribution is; you can also look at the robustness of the normal approximation for the regression coefficients.
Best Answer
The general linear model lets us write an ANOVA model as a regression model. Let's assume we have two groups with two observations each, i.e., four observations in a vector $y$. Then the original, overparametrized model is $E(y) = X^{\star} \beta^{\star}$, where $X^{\star}$ is the matrix of predictors, i.e., dummy-coded indicator variables: $$ \left(\begin{array}{c}\mu_{1} \\ \mu_{1} \\ \mu_{2} \\ \mu_{2}\end{array}\right) = \left(\begin{array}{ccc}1 & 1 & 0 \\ 1 & 1 & 0 \\ 1 & 0 & 1 \\ 1 & 0 & 1\end{array}\right) \left(\begin{array}{c}\beta_{0}^{\star} \\ \beta_{1}^{\star} \\ \beta_{2}^{\star}\end{array}\right) $$
The parameters are not identifiable as $((X^{\star})' X^{\star})^{-1} (X^{\star})' E(y)$ because $X^{\star}$ has rank 2 ($(X^{\star})'X^{\star}$ is not invertible). To change that, we introduce the constraint $\beta_{1}^{\star} = 0$ (treatment contrasts), which gives us the new model $E(y) = X \beta$: $$ \left(\begin{array}{c}\mu_{1} \\ \mu_{1} \\ \mu_{2} \\ \mu_{2}\end{array}\right) = \left(\begin{array}{cc}1 & 0 \\ 1 & 0 \\ 1 & 1 \\ 1 & 1\end{array}\right) \left(\begin{array}{c}\beta_{0} \\ \beta_{2}\end{array}\right) $$
So $\mu_{1} = \beta_{0}$, i.e., $\beta_{0}$ takes on the meaning of the expected value from our reference category (group 1). $\mu_{2} = \beta_{0} + \beta_{2}$, i.e., $\beta_{2}$ takes on the meaning of the difference $\mu_{2} - \mu_{1}$ to the reference category. Since with two groups, there is just one parameter associated with the group effect, the ANOVA null hypothesis (all group effect parameters are 0) is the same as the regression weight null hypothesis (the slope parameter is 0).
A $t$-test in the general linear model tests a linear combination $\psi = \sum c_{j} \beta_{j}$ of the parameters against a hypothesized value $\psi_{0}$ under the null hypothesis. Choosing $c = (0, 1)'$, we can thus test the hypothesis that $\beta_{2} = 0$ (the usual test for the slope parameter), i.e. here, $\mu_{2} - \mu_{1} = 0$. The estimator is $\hat{\psi} = \sum c_{j} \hat{\beta}_{j}$, where $\hat{\beta} = (X'X)^{-1} X' y$ are the OLS estimates for the parameters. The general test statistic for such $\psi$ is: $$ t = \frac{\hat{\psi} - \psi_{0}}{\hat{\sigma} \sqrt{c' (X'X)^{-1} c}} $$
$\hat{\sigma}^{2} = \|e\|^{2} / (n-\mathrm{Rank}(X))$ is an unbiased estimator for the error variance, where $\|e\|^{2}$ is the sum of the squared residuals. In the case of two groups $\mathrm{Rank}(X) = 2$, $(X'X)^{-1} X' = \left(\begin{smallmatrix}.5 & .5 & 0 & 0 \\-.5 & -.5 & .5 & .5\end{smallmatrix}\right)$, and the estimators thus are $\hat{\beta}_{0} = 0.5 y_{1} + 0.5 y_{2} = M_{1}$ and $\hat{\beta}_{2} = -0.5 y_{1} - 0.5 y_{2} + 0.5 y_{3} + 0.5 y_{4} = M_{2} - M_{1}$. With $c' (X'X)^{-1} c$ being 1 in our case, the test statistic becomes: $$ t = \frac{M_{2} - M_{1} - 0}{\hat{\sigma}} = \frac{M_{2} - M_{1}}{\sqrt{\|e\|^{2} / (n-2)}} $$
$t$ is $t$-distributed with $n - \mathrm{Rank}(X)$ df (here $n-2$). When you square $t$, you get $\frac{(M_{2} - M_{1})^{2} / 1}{\|e\|^{2} / (n-2)} = \frac{SS_{b} / df_{b}}{SS_{w} / df_{w}} = F$, the test statistic from the ANOVA $F$-test for two groups ($b$ for between, $w$ for within groups) which follows an $F$-distribution with 1 and $n - \mathrm{Rank}(X)$ df.
With more than two groups, the ANOVA hypothesis (all $\beta_{j}$ are simultaneously 0, with $1 \leq j$) refers to more than one parameter and cannot be expressed as a linear combination $\psi$, so then the tests are not equivalent.