I will give my examples with R calls. First a simple example of a linear regression with a dependent variable 'lifespan', and two continuous explanatory variables.
data.frame(height=runif(4000,160,200))->human.life
human.life$weight=runif(4000,50,120)
human.life$lifespan=sample(45:90,4000,replace=TRUE)
summary(lm(lifespan~1+height+weight,data=human.life))
Call:
lm(formula = lifespan ~ 1 + height + weight, data = human.life)
Residuals:
Min 1Q Median 3Q Max
-23.0257 -11.9124 -0.0565 11.3755 23.8591
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 63.635709 3.486426 18.252 <2e-16 ***
height 0.007485 0.018665 0.401 0.6884
weight 0.024544 0.010428 2.354 0.0186 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 13.41 on 3997 degrees of freedom
Multiple R-squared: 0.001425, Adjusted R-squared: 0.0009257
F-statistic: 2.853 on 2 and 3997 DF, p-value: 0.05781
In order to find the estimate of 'lifespan' when the value of 'weight' is 1, I add (Intercept)+height=63.64319
Now what if I have a similar data frame, but one where one of the explanatory variables is categorical?
data.frame(animal=rep(c("dog","fox","pig","wolf"),1000))->animal.life
animal.life$weight=runif(4000,8,50)
animal.life$lifespan=sample(1:10,replace=TRUE)
summary(lm(lifespan~1+animal+weight,data=animal.life))
Call:
lm(formula = lifespan ~ 1 + animal + weight, data = animal.life)
Residuals:
Min 1Q Median 3Q Max
-4.7677 -2.7796 -0.1025 3.1972 4.3691
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.565556 0.145851 38.159 < 2e-16 ***
animalfox 0.806634 0.131198 6.148 8.6e-10 ***
animalpig 0.010635 0.131259 0.081 0.9354
animalwolf 0.806650 0.131198 6.148 8.6e-10 ***
weight 0.007946 0.003815 2.083 0.0373 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.933 on 3995 degrees of freedom
Multiple R-squared: 0.01933, Adjusted R-squared: 0.01835
F-statistic: 19.69 on 4 and 3995 DF, p-value: 4.625e-16
In this case, to find the estimate of 'lifespan' when the value of 'weight' is 1, should I add each of the coefficients for 'animal' to the intercept: (Intercept)+animalfox+animalpig+animalwolf? Or what is the proper way to do this?
Thanks
Sverre
Best Answer
No, you shouldn't add all of the coefficients together. You essentially have the model
$$ {\rm lifespan} = \beta_{0} + \beta_{1} \cdot {\rm fox} + \beta_{2} \cdot {\rm pig} + \beta_{3} \cdot {\rm wolf} + \beta_{4} \cdot {\rm weight} + \varepsilon $$
where, for example, ${\rm pig} = 1$ if the animal was a pig and 0 otherwise. So, to calculate $\beta_{0} + \beta_{1} + \beta_{2} + \beta_{3} + \beta_{4}$ as you've suggested for getting the overall average when ${\rm weight}=1$ is like saying "if you were a pig, a wolf, and a fox, and your weight was 1, what is your expected lifespan?". Clearly since each animal is only one of those things, that doesn't make much sense.
You will have to do this separately for each animal. For example, $\beta_{0} + \beta_{2} + \beta_{4}$ is the expected lifespan for a pig when its weight is 1.