In my understanding, the abstract intuition of ANOVA is the following: One decomposes the sources of variance of the observed variable in various directions and investigates the respective contributions. To be more precise, one decomposes the identity map into a sum of projections and investigates which projections/directions make an important contribution to explaining the variance and which do not. The theoretical basis is Cochran's theorem.

To be less abstract, I cast the **second form mentioned by the OP** into the framework just described. Subsequently, I interpret the **first** form as a special case of the second one.

Let us consider a regression model with $K$ explanatory variables (the full model) and compare it to the restricted model with $K-J$ variables. WLOG, the last $J$ variables of the full model are not included in the restricted model. The question answered by ANOVA is

**"Can we explain significantly more variance in the observed variable if we include $J$ additional variables"**?

This question is answered by comparing the variance contributions of the first $K-J$ variables, the next $J$ variables, and the remainder/unexplained part (the residual sum of squares). This decomposition (obtained e.g. from Cochran's theorem) is used to construct the F-test. Thus, one analyses the reduction (by including more variables) in the residual sum of squares of the restricted model (corresponding to the $H_0:$ *all coefficients pertaining to the last $J$ variables are zero*) by including more variables and obtains the F-statistic
$$
\frac{
\frac{RSS_{restr} - RSS_{full}}{J}
}{
\frac{RSS_{full}}{N-K}
}$$
If the value is large enough, then the variance explained by the additional $J$ variables is significant.

Now, the **first form mentioned by the OP** is interpreted as a special case of the **second form**. Consider three different groups A, B, and C with means $\mu_A$, $\mu_B$, and $\mu_C$. The $H_0: \mu_A = \mu_B = \mu_C$ is tested by comparing the variance explained by the regression on an intercept (the restricted model) with the variance explained by the full model containing an intercept, a dummy for group A, and a dummy for group B. The resulting F-statistic
$$
\frac{
\frac{RSS_{intercept} - RSS_{dummies}}{2}
}{
\frac{RSS_{dummies}}{N-3}
}$$ is equivalent to the ANOVA-test on Wikipedia. The denominator is equal to the variation within the groups, the numerator is equal to the variation between the groups. If the variation between the groups is larger than the variation within the groups, one rejects the hypothesis that all means are equal.

## Best Answer

Two models are nested if we can simplify the bigger model into the smaller model by imposing constraints on the parameters.

Often this means setting a parameter to 0. For example, regression M1

$$ g(E\{Y\}) = \beta_0 + \beta_1 X_1 \color{white}{+ \beta_2 X_2} $$

is nested in regression M2

$$ g(E\{Y\}) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 $$

because M2 simplifies into M1 if we set $\beta_2 = 0$.

However, the principle is more general than fixing a parameter to equal a constant. In your case the constraint is $\beta = \alpha$. So your model 1 is indeed nested in your model 2.