I'm currently reading the book An R Companion to applied regression
and have started the section on effects plots which is a good method for seeing the effects of
independent variables on dependent variables.
The book explains the steps as follows
- Identify high order terms of a model (which seems to be when factors are multiplied by numeric vectors to produce interactions ~ the interactions are the high order terms)
- It then seems that that all the other independent values are held constant while the value we are interested in seeing the effect for is varied. The constant seems to be mean
I have three questions in relation to the plots
1) Although it is explained in the book, I'm having trouble getting my head around how the mean captured for factor variables when holding them constant
2) How is the Y value of the plot interpreted? Is it just a case of when the main line varies along the X-AXIS (independent variables), the Y Axis is the effect on the value to be predicted? If this is the case, how do we determine a large effect? I would assume a researcher would go back to the literature in whatever area they are studying to review previous researcher papers but effects seem to be rarely reported
3) How are interaction terms effects interpreted from the plot?
In essence, I'm trying to get a conceptual understanding to the plots. I assume that a plot with compact 95% confidence intervals and a steep slope is ideal scenario for the plot.
Taken from the effects
package as an example
# Examples
summary(TitanicSurvival)
titanic <- glm(survived ~ (passengerClass + sex + age)^2,
data=TitanicSurvival, family=binomial)
titanic.all <- allEffects(titanic, typical=median,
given.values=c(passengerClass2nd=1/3, passengerClass3rd=1/3, sexmale=0.5))
plot(titanic.all, ticks=list(at=c(.01, .05, seq(.1, .9, by=.2), .95, .99)), ask=FALSE)
plot(effect("passengerClass*sex*age", titanic, xlevels=list(age=0:65)), ticks=list(at=c(.001, .005, .01, .05, seq(.1, .9, by=.2), .95, .99, .995)))
Thank you for your help
Best Answer
The
effects
-plots (or also the numeric output) give you the predicted values of the outcome for certain given values for the predictors (independent variables). It just "inserts" the value of a predictor into the model formula. Since you calculate the effect for one predictor at time, the other predictors are "hold constant", i.e. their regression coefficients are not ignored, but - by default - their mean value is chosen.In your one example above, the default was changed to median, using the
typical
argument!Meaning of Holding values constant
Let me give you an example from a sample data set in my sjstats-package. The model estimates the effect of dependency (e42dep) of an older person and the age (c160age) of a family carer on the perceived burden of care (neg_c_7):
Shortened output:
Now lets see what
allEffects()
gives for results for thee42dep
variable:You can reproduce these values by hand, using the model formula and the values from your variables. To do this, we need the mean for all variables "hold constant", i.e.
c160age
in this case.Now you can calculate the predicted values ("effects") for the negative impact of care for cases where older persons have a dependency value of 1 (e42dep = 1), adjusted (hold constant) for age (c160age). To do this, we need the estimates from the fitted model above:
effect of neg_c_7 for e42dep = 1:
Above, you see:
estimate(intercept) + value*estimate(e42dep) + (hold-constant-value)*estimate(c160age)
, which gives 8.932278, which is almost the same (rounding errors) as the first value returned from theallEffects()
functions.The effect for e42dep = 2, is 10.443774. To calculate this by hand, you now multiply the estimate for
e42dep
by 2:The same is also achieved by the
predict
function from R:I assume the numbers slightly differ due to rounding errors. But anyway, this gives you an impression of the basic principle of "holding other predictors" constant. It's not a simple relationship between two variables, but the relationship adjusted for covariates.
Measures on the y-axis
According to your 2nd question: The y-axis of the plot gives you the predicted values for your outcome. The measure depends on your model. In your example, logistic regression, the values are predicted probabilities. In linear models, it's the predicted mean of the outcome. For poission for instance, it would be predicted incidents rates.
Interaction terms
Regarding interaction terms, you have an effect for each "value combination" of your interaction term. Let's take s simpler example: passengerClass * sex.
Now I use a function from my sjPlot-pakcage to plot the interaction effects. The data to make up this plot are indeed based on the
effects
-package, it's just that my package uses ggplot as plotting-system.You see, that in general, female persons had higher chances to survive, independent of their class. However, 1st class passengers had much better chances to survive in general than other class passengers. For male passengers, it there was hardly any difference between 2nd and 3rd class.