I would like to rank variables of a logistic regression model on the basis of their predictive importance.
The model has both categorical and continuous variables.
For this purpose, is it okay to assign say 1,2,3,4….. values to categories of a categorical variable and treat it as a continuous variable and then standardize it along with other continuous variables and get standardised estimates from logistic regression using the standardized variables as input to the model?
If the purpose is to find relative importance of variables of an already built model, is this approach alright?
Best Answer
While you can mess around with pseudo-R2s, I have never found them to be very informative or useful in a logit model. You also run into other problems when you compare logit models with different coefficients (I don't have an immediate reference but if you Google or look at CV for logit scaling factor you should get an idea).
Here are a couple of alternative approaches:
If you are unable to convince yourself and others that your categorical variables can be continuous, then you have a harder task. You could estimate predicted probabilities of the quantiles or deciles of the continuous variables, and compare them to the categorical variable.
In the end, because there isn't a direct way to do this in a logit model, I would approach this more than one way and see if all the methods triangulate together. If they do, you're golden. If not the story is more nuanced and will take more thinking.