Linear Regression – Calculating Slope and Intercept from Multiple Linear Regression

interactioninterceptlinearregression

Consider this linear equation:

$$
Y \sim \beta_0 + \beta_1X_1 + \beta_2X_2 + \beta_3X_3 + \beta_4(X_2*X_3) + \epsilon
$$

where

  • $Y$ is what I'm trying to explain (it happens to be the evolved growth rate of a species growing and evolving in a community).
  • $B_0$ is the intercept.
  • $X_1$ is species richness, modeled as a continuous variable, taking values 1, 2, 3, or 4.
  • $X_2$ and $X_3$ are the presence or absence of species A or species B. These are dummy variables coded either as 0 to indicate absence or 1 to indicate presence of that particular species.
  • $\beta_1$, $\beta_2$, and $\beta_3$ are coefficients (or slopes) for $X_1$, $X_2$, and $X_3$.
  • $\beta_4$ is the coefficient of the interaction term.

Notice that there in no interaction with $X_1$ and the other variables. This is because it doesn't make biological sense that this variable interacts with the other variables.

This is what the model looks like in R:

lm(Y ~ species_richness + A_present * B_present, data = .x)

When I run the model with my data, these are the coefficents (along with their P-values) predicted by the model:

B0 <- 0.1619 (P < 0.0001)
B1 <- -0.0112 (P = 0.0159)
B2 <- -0.0133 (P = 0.0924)
B3 <- -0.0224 (P = 0.0034)
B4 <- 0.0841 (P < 0.0001)

My questions are how do I calculate the slopes and intercepts of this model when a species is or is not absent?

Now if I want to calculate the slope and intercept of the model when species B, represented as $X_3$ is absent, it seems easy enough. Just plug 0 into the linear equation above and here is what you get:

$$
Y ~ \sim \beta_0 + \beta_1X_1 + \beta_2X_2 + 0 + 0 + \epsilon
$$

The way I interpret the equation now is that I want to get the slope of the line for a one unit change of species richness, then $X_1$, becomes a values of 1. And if I also consider that species A is present, the $X_2$ becomes one. Because $X_1$ and $X_2$ are both $X$ variables left in the equation, when you add them up, you get the slope of the line. So, $ slope = \beta_1X_1 + \beta_2X_2$ The only coefficient left is $B_0$, so that's obviously the intercept. When I fit slope and this intercept on the data, it's clear that this is fitting the data where species B is absent. Even though it fits well, please let me know if my logic is correct.

I'm having much more trouble fitting the line when I want to consider that species B is present. In this case, I think I should subsitute a value of 1 for all times when $X_3$ is present in the equation. That looks like this:

$$
Y \sim \beta_0 + \beta_1X_1 + \beta_2X_2 + \beta_3*1 + \beta_4(X_2*1) + \epsilon
$$

If this is true, then does $slope = \beta_1X_1 + \beta_2X_2 + \beta_4(X_2*1)$ and $intercept = \beta_0 + \beta_3$ ? When I plot this the incercept is definitely off and it's hard to tell if the slope is correct.

Thus, what are the slope and incercept of the model if we consider a one unit change in species richness, which is $X_1$, and we consider that species A, $X_2$, and species B, $X_3$. are both present.

Thank you and let me know if I can clarify anything further.

Best Answer

Before addressing the main problem. Notice that species richness isn't a continuous variable. It's a categorical variable with an ordinal structure, however let's ignore that.

As you said $X_2,X_3,X_4$ are indicator functions for the presence of species $A$, $B$ and $A\cap B$, thus their impact will $0$ or $\beta_i$, $i\in\{2,3,4\}$. Suppose $X_2=1, X_3=0$, notice that your regression will be

$$ Y = (\beta_0 +\beta_2) + \beta_1 X_1 $$ such that $\beta_1$ is the slope of the fitted line. Dummy variables only change the intercept of the line.

You are saying $X_2=1$ then you forget to actually replace it in the equation, also you are mixing parameters and variables, slope or angular coefficient is a term that refers to parameters. $X\beta$ isn't a coefficient, it's a linear predictor, the linear combination of parameters and covariates.

Related Question