Everything that you have written is correct. You can always test out things like that with a toy example. Here is an example with R:
library(MASS)
rho <- .5 ### the true correlation in both groups
S1 <- matrix(c( 1, rho, rho, 1), nrow=2)
S2 <- matrix(c(16, 4*rho, 4*rho, 1), nrow=2)
cov2cor(S1)
cov2cor(S2)
xy1 <- mvrnorm(1000, mu=c(0,0), Sigma=S1)
xy2 <- mvrnorm(1000, mu=c(0,0), Sigma=S2)
x <- c(xy1[,1], xy2[,1])
y <- c(xy1[,2], xy2[,2])
group <- c(rep(0, 1000), rep(1, 1000))
summary(lm(y ~ x + group + x:group))
What you will find that the interaction is highly significant, even though the true correlation is the same in both groups. Why does that happen? Because the raw regression coefficients in the two groups reflect not only the strength of the correlation, but also the scaling of X (and Y) in the two groups. Since those scalings differ, the interaction is significant. This is an important point, since it is often believed that to test the difference in the correlation, you just need to test the interaction in the model above. Let's continue:
summary(lm(xy2[,2] ~ xy2[,1]))$coef[2] - summary(lm(xy1[,2] ~ xy1[,1]))$coef[2]
This will show you that the difference in the regression coefficients for the model fitted separately in the two groups will give you exactly the same value as the interaction term.
What we are really interested in though is the difference in the correlations:
cor(xy1)[1,2]
cor(xy2)[1,2]
cor(xy2)[1,2] - cor(xy1)[1,2]
You will find that this difference is essentially zero. Let's standardize X and Y within the two groups and refit the full model:
x <- c(scale(xy1[,1]), scale(xy2[,1]))
y <- c(scale(xy1[,2]), scale(xy2[,2]))
summary(lm(y ~ x + x:group - 1))
Note that I am not including the intercept or the group main effect here, because they are zero by definition. You will find that the coefficient for x is equal to the correlation for group 1 and the coefficient for the interaction is equal to the difference in the correlations for the two groups.
Now, for your question whether it would be better to use this approach versus using the test that makes use of Fisher's r-to-z transformation.
EDIT
The standard errors of the regression coefficients that are calculated when you standardize the X and Y values within the groups do not take this standardization into consideration. Therefore, they are not correct. Accordingly, the t-test for the interaction does not control the Type I error rate adequately. I conducted a simulation study to examine this. When $\rho_1 = \rho_2 = 0$, then the Type I error is controlled. However, when $\rho_1 = \rho_2 \ne 0$, then the Type I error of the t-test tends to be overly conservative (i.e., it does not reject often enough for a given $\alpha$ value). On the other hand, the test that makes use of Fisher's r-to-z transformation does perform adequately, regardless of the size of the true correlations in both groups (except when the group sizes get very small and the true correlations in the two groups get very close to $\pm1$.
Conclusion: If you want to test for a difference in correlations, use Fisher's r-to-z transformation and test the difference between those values.
Remember that the difference between significant and non-significant is not (always) statistically significant
Now, more to the point of your question, model 1 is called pooled regression, and model 2 unpooled regression. As you noted, in pooled regression, you assume that the groups aren't relevant, which means that the variance between groups is set to zero.
In the unpooled regression, with an intercept per group, you set the variance to infinity.
In general, I'd favor an intermediate solution, which is a hierarchical model or partial pooled regression (or shrinkage estimator). You can fit this model in R with the lmer4 package.
Finally, take a look at this paper by Gelman, in which he argues why hierarchical models helps with the multiple comparisons problems (in your case, are the coefficients per group different? How do we correct a p-value for multiple comparisons).
For instance, in your case,
library(lme4)
summary(lmer( leg ~ head + (1 | site)) # varying intercept model
If you want to fit a varying-intercept, varying slope (the third model), just run
summary(lmer( leg ~ head + (1 | site) + (0+head|site) )) # varying intercept, varying-slope model
Then you can take a look at the group variance and see if it's different from zero (the pooled regression isn't the better model) and far from infinity (unpooled regression).
update:
After the comments (see below), I decided to expand my answer.
The purpose of a hierarchical model, specially in cases like this, is to model the variation by groups (in this case, Sites). So, instead of running an ANOVA to test if a model is different from another, I'd take a look at the predictions of my model and see if the predictions by group is better in the hierarchical models vs the pooled regression (classical regression).
Now, I ran my sugestions above and foudn that
ranef(lmer( leg ~ head + (1 | site) + (0+head|site) )
Would return zero as estimates of varying slope (varying effect of head by site).
then I ran
ranef(lmer( leg ~ head + (head| site))
And I got a non-zero estimates for the varying effect of head. I don't know yet why this happened, since it's the first time I found this. I'm really sorry for this problem, but, in my defense, I just followed the specification outlined in the help of the lmer function.
(See the example with the data sleepstudy). I'll try to understand what's happening and I'll report here when (if) I understand what's happening.
Best Answer
I will answer the technical question, then try to talk you out of doing this.
The intercept is the predicted value when the abscissa is equal to zero. Hence, the intercepts in the example are obtained via:
... and the comparisons thereof can be tested this way:
That said, it is an unusual instance that the intercept is an interesting or meaningful quantity to want to do inferences about. In many datasets, the intercept is a severe extrapolation because a predictor value of zero is distant from its observed values. Models are only approximations to the truth, and it is highly questionable that one should believe that the straight line you have fitted actually represents the trend at a distant point.
Thus, I urge you to re-think what you are trying to do and decide on what meaningful question you are really trying to answer.