Solved – Calculating predicted values from categorical predictors in logistic regression

categorical datadata visualizationlogistic

Context:

I am working with an ordinal logistic model and trying to interpret/present the results. The model has two continuous predictors of interests, and a mix of continuous and categorical controls. I was hoping to graph the predicted likelihood of the top outcome (being accepted into a school) across multiple levels of my IVs of interest.

I am using R's predict() function to generate predicted likelihoods. For my IVs of interest, I chose a range of reasonable values (i.e. mean +- 1 SD). For the continuous predictors, I can use sensible baseline values (usually 0) because they are mean-centered or standardized.

I am trying to work out how to approach the categorical predictors. I've explored my options by plugging in different values, and in most cases the result is just a small shift in the output curve. For one variable however, the differences are huge, so I need to find a way to present results that are general to the different levels of that variable.

Perhaps an example would help clarify. In these two graphs, the two IVs of interest are plotted on the x-axis and as the 3 lines. Each graph shows the output given a single level of my troublesome categorical control, "Admitting School" (which has 4 levels total)

enter image description here
enter image description here

Other graphs and R syntax here if you're curious

Question:

  • How should I represent the model across all levels of the categorical variables in a single graph?

Initial Thoughts:

  • Aggregate predicted values across each level of Admitting School with some sort of weighted average.
  • This post suggests using the proportion of cases of each type as the input for each variable. As in, if 32% of my cases came from School 1, I would use .32*B-school1 in the prediction formula. I don't know how to do that in R, since those variables are factors, but if it's an appropriate approach, I'm sure I could figure it out.

Sorry for the verbosity and thanks in advance for any help.

Best Answer

My initial thought would have been to display the probability of of acceptance as a function of relative GPA for each of your four schools, using some kind of trellis displays. In this case, facetting should do the job well as the number of schools is not so large. This is very easy to do with lattice (y ~ gpa | school) or ggplot2 (facet_grid(. ~ school)). In fact, you can choose the conditioning variable you want: this can be school, but also situation at undergrad institution. In the latter case, you'll have 4 curves for each plot, and three three plot of Prob(admitting) ~ GPA.

Now, if you are looking for effective displays of effects in GLM, I would recommend the effects package, from John Fox. Currently, it works with binomial and multinomial link, and ordinal logistic model. Marginalizing over other covariates is handled internally, so you don't have to bother with that. There are a lot of illustrations in the on-line help, see help(effect). But, for a more thorough overview of effects displays in GLM, please refer to

  1. Fox (2003). Effect Displays in R for Generalised Linear Models. JSS 8(15).
  2. Fox and Andersen (2004). Effect displays for multinomial and proportional-odds logit models. ASA Methodology Conference -- Here is the corresponding JSS paper