Solved – Dumthe variable in regression analysis (problem with result output and plotting)

data visualizationrregressionregression coefficients

I want to run the following model: Weight ~ Height*Sex, where * sign means interaction. I got the following result:

modell <- lm(df$weight ~ df$height*df$SEX)
summary(model)
# ...
# Coefficients:
#                        Estimate  Std. Error t value   Pr(>|t|)
# (Intercept)             29.5514    43.1282   0.685    0.495
# df$height               0.2996     0.2408    1.244    0.217
# df$SEXfemale            7.0516     61.6167   0.114    0.909
# df$height:df$SEXfemale -0.1176     0.3594   -0.327    0.744
# 
# Residual standard error: 11.79 on 96 degrees of freedom
# Multiple R-squared:  0.3452,  Adjusted R-squared:  0.3248 
# F-statistic: 16.87 on 3 and 96 DF,  p-value: 7.015e-09

As you can see, I got only df_SEXfemale and df_height:df_SEXfemale. But coefficients with df_SEXmale are absent (I suppose because they are interpreted as number 0). And df$SEX is a factor variable with 2 levels (male and female).

So my questions are:

  1. How can I correct this situation?
  2. How can I plot regression lines for both groups separately (female and male) without using the ggplot2 package?

Best Answer

You have this model:

$$\text{Weight}=\beta_0+\beta_1\text{Height}+\beta_2\text{Sex}+ \beta_3\text{Height}\cdot\text{Sex}$$

In your case, $\text{Sex(Male)} = 0$ and $\text{Sex(Female)} = 1$

Substituting we get two equations:

$$\text{Weight(Male)}=\beta_0+\beta_1\text{Height}$$

$$\text{Weight(Female)}=(\beta_0+\beta_2)+(\beta_1+\beta_3)\text{Height}$$

So there you have it, $\text{Male}$ is the baseline which the coefficient corresponds, for $\text{Female}$ you need to sum the other coefficients.


Look at this example and how the contrasts argument change the fit

data = iris[1:100,3:5]
data$Species = factor(data$Species)
fit1 = lm(Petal.Length ~ Petal.Width * Species, data = data, contrasts = list(Species = c(1,0)))
fit2 = update(fit1, contrasts = list(Species = c(0,1)))

Basically, you can take the intercept and slope of each fit as the regression line equation for each group in Species.


By default, one element of contrasts will be 0, i.e. its corresponding level will be the baseline as I explained. But that doesn't need to be the case, you could specify contrasts = list(someFactor = c(-1,1)), and the baseline would be an intermediate state. To get the regression lines of each level in someFactor you would need to respectively subtract and sum coefficients.

Related Question