Solved – How to interpret multiple factors in model output in R

categorical datainterpretationmultiple regressionrregression

I am well aware how to read the model summary in R for a regression model when a factor is included. The "first" level, in terms of ABC, is regarded as the base level to which all further levels of that factor are compared to. In an ANOVA-style model the baseline value is found in the intercept (equals the mean response value for base-level class).

However, if one or several factors are mixed with continuous predictors, then how can I see what the base-level values are at all? to what would I compare?

In the model output below, there are two factors:

  1. LandUse (4 Levels)
  2. Type_LU (4 Levels)

I see now for example that LandUseLow is 0.35 units higher than the base-line LandUseHigh. Would one now simply look at the mean response for the class LandUseHigh and compare? Is it that simple?

My_model: 
                  Estimate Std. Error Adjusted SE z value Pr(>|z|)
(Intercept)      4.6086772  1.7754606   1.7773711   2.593  0.00951 **
DiTempRange     -0.1409464  0.0764872   0.0765969   1.840  0.06575 .
LandUseLow       0.3520743  0.5989777   0.5997903   0.587  0.55721
LandUseMedium    0.2741413  0.3668149   0.3675811   0.746  0.45579
LandUseNone     -1.0128945  0.5735342   0.5744652   1.763  0.07787 .
MAP              0.0048810  0.0009128   0.0009142   5.339    1e-07 ***
Rivier           0.3502782  0.2252743   0.2257546   1.552  0.12076
TempRange        0.0823410  0.0606546   0.0607942   1.354  0.17560
Tmean           -0.1762862  0.0994486   0.0996353   1.769  0.07684 .
TYPE_LUconserva -0.9312487  0.4770681   0.4781244   1.948  0.05145 .
TYPE_LUprivate  -0.4839229  0.3289011   0.3296201   1.468  0.14207
TYPE_LUstate     0.0004062  0.4079678   0.4089744   0.001  0.99921
logVRM           0.1370973  0.1140166   0.1142342   1.200  0.23008
logTWI          -0.0735540  0.4195267   0.4202589   0.175  0.86106
logDAH           1.7132823  3.5937000   3.6028996   0.476  0.63441

Best Answer

@Placidia is right. However, this is a simplified case because you do not have any interactions. That is, your model assumes (rightly or wrongly) that moving from LandUseHigh to LandUseLow is associated with an increase of 0.25 units in your response variable no matter what the level of Type_LU is or what the values of your continuous covariates are.

If you did have interactions, then the coefficient on LandUseLow would correspond to the change from LandUseHigh only when Type_LU is set at the reference level. Likewise, if there were an interaction between LandUse and a continuous covariate, the coefficient on LandUseLow would correspond to the change from LandUseHigh only when the value of the continuous covariate were $0$. If you had a three-way interaction between LandUse, Type_LU and a continuous covariate, it would indicate the change when both Type_LU is set at the reference level and the value of the continuous covariate were $0$.


Update: The (intercept) indicates the value of the reference category. You have two categorical variables, so you have two reference categories. Your reference categories are LandUseHigh and an unspecified level of Type_LU (which I assume you know). So the value of the Estimate in the (intercept) row is the predicted mean for those study units in both of those categories when all the continuous covariates are equal to $0$. Again, because you don't have an interaction term, the value of LandUseHigh when Type_LU is conserva, private, or state is the sum of the estimate for the intercept plus the estimate for the appropriate level of Type_LU.

Related Question