Solved – Interpreting the intercept for linear regression with binary predictors

linearrregression

My data contains two binary predictors (encoded in R as factor's) and a continous response. When I fit a simple linear model (no interactions) using lm, the value of the intercept is smaller than the infimum of the set of values that the response takes in my dataset (which is equal to around 0.33 when the predictors are zero).

   Call:
lm(formula = lm.a)

Coefficients:
(Intercept)      df$x1       df$x2  
     0.1222       0.4276    0.7988  

I am not sure how to interpret the coefficient for the intercept. I thought this was the predicted value when there x1 and x2 are set to zero.

EDIT: Thank you for your responses. After visualizing the data and fitted hyperplane from lm (see image below) I found that the simple linear model without interactions is underfitting the dataset which is why the predictions are off. I found that adding an interaction term fits the points better but an interaction term doesn't make sense for my dataset. I'll have to think of a better model.

enter image description here

EDIT 2: After thinking about it a little, interactions make a lot of sense for my dataset. I'm just going to use something like y ~ x1 + x2 + x1*x2.

Best Answer

The following is true when the predictors are continuous variables.

I thought this was the predicted value when there x1 and x2 are set to zero.

In the case of categorical (eg binary) predictors, the intercept might be interpreted differently as we need to introduce one or more auxiliary binary variables. Each of these variables represents one of the levels (i.e. the unique values in the domain of a categorical variable). Let me provide you an example:

Assume we have a degree variable which can take undergraduate and postgraduate values, and we want to model the salary based on this variable, then we should model as:

$\text{salary} = \beta_0 + \beta_1 \,\text{degree:under} + \beta_2 \, \text{degree:higher}\hspace{1cm}$

Therefore, for a data point representing a higher degree graduate data point, we will have $\text{degree:under} = 0$ and $\text{degree:higher}=1$. Note that since $\text{degree:under}$ and $\text{degree:higher}$ are each other's compliment, there is no need to keep both of them (which increases the model complexity). For example, we can keep $\text{degree:under}$ and remove the other one:

$\text{salary} = \beta_0 + \beta_1 \,\text{degree:under}\hspace{1cm}$

In this case, $\beta_0$ is the estimated salary of $\text{degree:under} = 0$ or the average salary of the sampled higher degree graduates ($\text{degree:higher} = 1$).

Therefore, in your case, $0.1222$ is the estimated response when both binary variables are False or bothreference variables ($\bar{x}_1$ and $\bar{x}_2$) are True.

Hope it helps.