Solved – How to quantify the Relative Variable Importance in Logistic Regression in terms of p

importancelogistic

Suppose a logistic regression model is used to predict whether an online shopper will purchase a product (outcome: purchase), after he clicked a set of online adverts (predictors: Ad1, Ad2, and Ad3).

The outcome is a binary variable: 1 (purchased) or 0 (not purcahsed).
The predictors are also binary variables: 1 (clicked) or 0 (not clicked).
So all variables are on the same scale.

If the resulting coefficients of Ad1, Ad2, and Ad3 are 0.1, 0.2, and 03,
we can conclude that Ad3 is more important than Ad2, and Ad2 is more important than Ad1. Furthermore, since all variables are on the same scale, the standardized and un-standardized coefficients should be same, and we can further conclude that Ad2 is twice important than Ad1 in terms of its influence on the logit (log-odds) level.

But in practice we care more about how to compare and interpret the relative importance of the variables in terms of p(probability of the purchase) level, not the logit(log-odds).

Thus the question is: Is there any approach to quantify the relative importance of these variables in terms of p?

Best Answer

For linear models you can use the absolute value of the t-statistics for each model parameter.

Also, you can use something like a random forrest and get a very nice list of feature importances.

If you are using R check out (http://caret.r-forge.r-project.org/varimp.html), if you are using python check out (http://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html#example-ensemble-plot-forest-importances-py)

EDIT:

Since logit has no direct way to do this you can use a ROC curve for each predictor.

For classification, ROC curve analysis is conducted on each predictor. For two class problems, a series of cutoffs is applied to the predictor data to predict the class. The sensitivity and specificity are computed for each cutoff and the ROC curve is computed. The trapezoidal rule is used to compute the area under the ROC curve. This area is used as the measure of variable importance

An example of how this works in R is:

library(caret)
mydata <- data.frame(y = c(1,0,0,0,1,1),
                 x1 = c(1,1,0,1,0,0),
                 x2 = c(1,1,1,0,0,1),
                 x3 = c(1,0,1,1,0,0))

fit <- glm(y~x1+x2+x3,data=mydata,family=binomial())
summary(fit)

varImp(fit, scale = FALSE)