Confusion of “slope” and “intercept” in linear regression

regression

I read John Fox's book "Regression diagnostics: An introduction". In Chapter 2, he gave an example which I quoted below:

… if $y$ is income, $x$ is education in years, and $g$ is the factor
gender, with levels male, female, and nonbinary, coding the dummy
regressors $d_f = 1$ for females and 0 otherwise, and $d_m = 1$ for
males and 0 otherwise,

the model $y = \beta_o + \beta_1x + \beta_2d_f + \beta_3d_m + \epsilon$ assumes the same education slope for all three genders but
potentially different intercepts.

Below is what I do not understand:

Why does this model have the "same slope" but "different intercepts"? I thought their slopes are different with different betas ($\beta_1, \beta_2, \beta_3$). On the contrary, the intercept is the same ($\beta_0$). Anyway, why the intercept can be different (there is only one variable $\beta_0$)?

Best Answer

For a male person, the equation is $$y=\beta_1x + (\beta_0+\beta_3)+\epsilon$$ and for a female person, the equation is $$y=\beta_1x + (\beta_0+\beta_2)+\epsilon$$

These two equations have the same education slope and potentially different intercepts as described in the text.

Related Solutions

Solved – ANCOVA in R suggests different intercepts, but the 95% CIs overlap… how is this possible

Remember that the difference between significant and non-significant is not (always) statistically significant

Now, more to the point of your question, model 1 is called pooled regression, and model 2 unpooled regression. As you noted, in pooled regression, you assume that the groups aren't relevant, which means that the variance between groups is set to zero.

In the unpooled regression, with an intercept per group, you set the variance to infinity.

In general, I'd favor an intermediate solution, which is a hierarchical model or partial pooled regression (or shrinkage estimator). You can fit this model in R with the lmer4 package.

Finally, take a look at this paper by Gelman, in which he argues why hierarchical models helps with the multiple comparisons problems (in your case, are the coefficients per group different? How do we correct a p-value for multiple comparisons).

For instance, in your case,

library(lme4)
summary(lmer( leg ~ head + (1 | site)) # varying intercept model

If you want to fit a varying-intercept, varying slope (the third model), just run

summary(lmer( leg ~ head + (1 | site) + (0+head|site) )) # varying intercept, varying-slope model

Then you can take a look at the group variance and see if it's different from zero (the pooled regression isn't the better model) and far from infinity (unpooled regression).

update: After the comments (see below), I decided to expand my answer.

The purpose of a hierarchical model, specially in cases like this, is to model the variation by groups (in this case, Sites). So, instead of running an ANOVA to test if a model is different from another, I'd take a look at the predictions of my model and see if the predictions by group is better in the hierarchical models vs the pooled regression (classical regression).

Now, I ran my sugestions above and foudn that

ranef(lmer( leg ~ head + (1 | site) + (0+head|site) )

Would return zero as estimates of varying slope (varying effect of head by site). then I ran

ranef(lmer( leg ~ head + (head| site))

And I got a non-zero estimates for the varying effect of head. I don't know yet why this happened, since it's the first time I found this. I'm really sorry for this problem, but, in my defense, I just followed the specification outlined in the help of the lmer function. (See the example with the data sleepstudy). I'll try to understand what's happening and I'll report here when (if) I understand what's happening.

Solved – Beta coefficients from stratified analysis when there are covariates

Here's an intuitive answer: When you stratify those last models, as you note, the intercept changes. However, the actual values do not change, so the mean predicted value should not change. But if you kept the coefficients the same, the mean predicted value would change - by just as much as the intercept changed.

Best Answer

Related Solutions

Solved – ANCOVA in R suggests different intercepts, but the 95% CIs overlap… how is this possible

Solved – Beta coefficients from stratified analysis when there are covariates

Related Question