Remember that the difference between significant and non-significant is not (always) statistically significant
Now, more to the point of your question, model 1 is called pooled regression, and model 2 unpooled regression. As you noted, in pooled regression, you assume that the groups aren't relevant, which means that the variance between groups is set to zero.
In the unpooled regression, with an intercept per group, you set the variance to infinity.
In general, I'd favor an intermediate solution, which is a hierarchical model or partial pooled regression (or shrinkage estimator). You can fit this model in R with the lmer4 package.
Finally, take a look at this paper by Gelman, in which he argues why hierarchical models helps with the multiple comparisons problems (in your case, are the coefficients per group different? How do we correct a p-value for multiple comparisons).
For instance, in your case,
library(lme4)
summary(lmer( leg ~ head + (1 | site)) # varying intercept model
If you want to fit a varying-intercept, varying slope (the third model), just run
summary(lmer( leg ~ head + (1 | site) + (0+head|site) )) # varying intercept, varying-slope model
Then you can take a look at the group variance and see if it's different from zero (the pooled regression isn't the better model) and far from infinity (unpooled regression).
update:
After the comments (see below), I decided to expand my answer.
The purpose of a hierarchical model, specially in cases like this, is to model the variation by groups (in this case, Sites). So, instead of running an ANOVA to test if a model is different from another, I'd take a look at the predictions of my model and see if the predictions by group is better in the hierarchical models vs the pooled regression (classical regression).
Now, I ran my sugestions above and foudn that
ranef(lmer( leg ~ head + (1 | site) + (0+head|site) )
Would return zero as estimates of varying slope (varying effect of head by site).
then I ran
ranef(lmer( leg ~ head + (head| site))
And I got a non-zero estimates for the varying effect of head. I don't know yet why this happened, since it's the first time I found this. I'm really sorry for this problem, but, in my defense, I just followed the specification outlined in the help of the lmer function.
(See the example with the data sleepstudy). I'll try to understand what's happening and I'll report here when (if) I understand what's happening.
Best Answer
For a male person, the equation is $$y=\beta_1x + (\beta_0+\beta_3)+\epsilon$$ and for a female person, the equation is $$y=\beta_1x + (\beta_0+\beta_2)+\epsilon$$
These two equations have the same education slope and potentially different intercepts as described in the text.