Solved – How to plot logistic models with many categorical variables

data visualizationlogistic

How to plot logistic models with many categorical variables?

Specifically,

I'm creating the following kind of model (more variables are still to be added):

glm(cancer ~ trt + factor(exposure) + skin + 
gender + factor(age), family = binomial, data = dta)

which is about modelling how different variables affect the risk of getting skin cancer.

One can see that it would be interesting to plot along both age (range: integers [28, 84]) and exposure (range: integers [1, 21]), however plotting against two variables doesn't seem to be possible in a typical y-x setting so plotting against two variables would either be a 3D plot or is there perhaps some other way?

Best Answer

Visualizing a logistic model with multiple continuous variables is considerably more complicated, but it becomes much simpler if all variables are categorical. When the X-variables are categorical, logistic regression is just fitting the proportion of 'successes' within each combination of categories. The standard way to plot proportions within a series of categories is to use a spineplot or a mosaicplot (cf., here).

If you have a larger number of variables, you could form a plot matrix of spineplots. The issue with plot matrices is that each is a marginal projection (cf., here). Another possibility is to form conditioning plots. I don't have access to your cancer dta dataset; below is a quick illustration with the Titanic dataset. If you wanted these for publication, you would want to do some extra work to make them 'pretty' (clean up the axes, etc.), but this should give you the idea.

data(Titanic)
d = as.data.frame(Titanic)
d = d[rep.int(row.names(d), times=d$Freq), 1:4]
d$Survived = factor(d$Survived, levels=c("Yes","No"))

pan.fun = function(x, y, ...){
    usr <- par("usr"); on.exit(par(usr))
    par(usr = c(usr[1:2], 0, 1.5), new=T )
    spineplot(as.factor(x), as.factor(y), main="", xlab="", ylab="", axes=F)
}
windows()
  pairs(d[,c(4,1:3)], panel=pan.fun)

spineplot matrix

windows()
  coplot(Survived~Sex|Class*Age, d, panel=pan.fun)

conditioning spineplot

Related Question