Solved – What’s the best way to visualize the effects of categories & their prevalence in logistic regression

data visualizationlogisticsurvey

I need to present information about the main predictors of a candidate's votes using a public opinion survey data. I have run a logistic regression using all the variables that I care about, but I can't find a good way to present this information.

My client doesn't care about the size of the effect only, but about the interaction between the size of the effect and the size of the population with such attribute.

How can I deal with that in a graph? Any suggestions?

Here is an example:

The $\beta$ of variable SEX (Male=1) when the dependent variable is Vote/Not in a candidate is 2.3, which is a big number after having been exponentiated and treated as odds ratio or probability. However, the society in which this survey was run only had 30% men. Therefore, although man supported this candidate quite a lot, their numbers are insignificant for a candidate trying to win a majoritarian election.

Best Answer

I agree with @PeterFlom that the example is odd, but setting that aside, I notice that the explanatory variable is categorical. If that is consistently true, it simplifies this greatly. I would use mosaic plots to present these effects. A mosaic plot displays conditional proportions vertically, but the width of each category is scaled relative to its marginal (i.e., unconditional) proportion in the sample.

Here is an example with the data from the Titanic disaster, created using R:

data(Titanic)

sex.table   = margin.table(Titanic, margin=c(2,4))
class.table = margin.table(Titanic, margin=c(1,4))
round(prop.table(t(sex.table), margin=2), digits=3)
#          Sex
# Survived  Male Female
#      No  0.788  0.268
#      Yes 0.212  0.732
round(prop.table(t(class.table), margin=2), digits=3)
#           Class
# Survived   1st   2nd   3rd  Crew
#      No  0.375 0.586 0.748 0.760
#      Yes 0.625 0.414 0.252 0.240

windows(height=3, width=6)
  par(mai=c(.5,.4,.1,0), mfrow=c(1,2))
  mosaicplot(sex.table,   main="")
  mosaicplot(class.table, main="")

enter image description here

On the left, we see that women were much more likely to survive, but men accounted for perhaps about 80% of the people on board. So increasing the percentage of male survivors would have meant many more lives saved than even a larger increase in the percentage of female survivors. This is somewhat analogous to your example. There is another example on the right where the crew and steerage made up the largest proportion of people, but had the lowest probability of surviving. (For what it's worth, this isn't a full analysis of these data, because class and sex were also non-independent on the Titanic, but it is enough to illustrate the ideas for this question.)