Solved – Calculating variable importance in a multinomial logit model

importancemethodologymlogitr

I am analysing data from a discrete choice experiment from a sample of 1000 responses where a respondent were presented with two cards and had to choose their preferred option. In the survey there were 3 attributes out of which one was categorical while the two other (price, speed) were continuous. Below is the model output.

Call:
mlogit(formula = choice ~ speed + price + cat1 + cat2 + cat3, 
    data = test.data, method = "nr", print.level = 0)

Frequencies of alternatives:
      1       2 
0.71558 0.28442 

nr method
5 iterations, 0h:0m:0s 
g'(-H)^-1g = 0.000741 
successive function values within tolerance limits 

Coefficients :
                Estimate Std. Error  t-value Pr(>|t|)    
2:(intercept) -0.3582505  0.0372617  -9.6144  < 2e-16 ***
speed          -0.1726341  0.0105252 -16.4020  < 2e-16 ***
price         -0.1672694  0.0073242 -22.8380  < 2e-16 ***
cat1           0.0965562  0.0415806   2.3221  0.02023 *  
cat2           0.8722452  0.0526031  16.5816  < 2e-16 ***
cat3          -0.0788710  0.0451093  -1.7484  0.08039 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Log-Likelihood: -5419.7
McFadden R^2:  0.093196 
Likelihood ratio test : chisq = 1114 (p.value = < 2.22e-16

Based on these results, I would like to calculate relative variable importance and have this scaled to 100. Based on reading chapter 9.4 here, my understanding is that I'm meant to use the difference of the range in the attribute’s utility values and then scale this to a 100. As I have a multinomial logit model, I would of course have to first exponentiate the utility values.

However, I do not know how to extract utility values for the different attributes levels using the mlogit package – is this somehow possible?

Also, how should I go about dealing with the fourth categorical value that is not in the model?

Best Answer

You might consider using a dominance analysis instead (see Luchman, 2014). This approach would decompose the McFadden's R2 and might, as a consequence, be easier to interpret than the utility-based approach discussed here.

That would avoid the 4th categorical value issue and would scale to 100 if standardized based on the McFadden R2.

The dominance method is implemented in the R package relaimpo, but has not been extended to accommodate mlogits. The only version I know of that does is in Stata. That said, dominance analysis is not a terribly difficult method to implement.

To do so, collect all McFadden's R2's according to each possible subset including or omitting each predictor (in your case 2^5 - 1 different models) and average them as in the linked article (note: the gives some examples along which to follow).

Related Solutions

Solved – Multinomial logit: mlogit vs statsmodels

I'm the creator of pylogit. Thanks for using my package!

To answer your question, the differences in estimation results comes from differences in the way choice data is represented in statsmodels versus mlogit. The TravelMode dataset is in long format natively (i.e. when you wrote it to csv). I.e. it has one row per alternative per observation. However, statsmodels assumes ones data is in wide format, with one row per observation. Thus there are 4 times as many "observations" in the statsmodels model (840) than there are in your mlogit model. Try calling length(unique(TravelMode$individual)).

Moreover, your variables (pd.get_dummies(TM['mode'])) in your statsmodel model do not represent alternative specific constants, but are simply the alternative identifiers for each row of the long-format data. This is in contrast to the intercepts that were estimated in the mlogit model.

Solved – Elasticities for Multinomial Logit Model (Stata)

Stata has the margins command that makes this as easy as pie to get elasticities for continuous variables (% change in probability of each outcome for a % change in x) and semi-elasticities for dummy variables (% change in probability of each outcome when x goes from 0 to 1). mfx has been superseded by margins.

Having said that, I find these elasticities hard to interpret and prefer to look at the change in probability for a % change in x or unit change in x. I will do that below as well, and try to connect the dots between the two ways of doing this.

Here's example of Stata code with interpretation for what you want:

. webuse sysdsn1
(Health insurance data)

. mlogit insure age i.(male nonwhite site), nolog

Multinomial logistic regression                 Number of obs     =        615
                                                LR chi2(10)       =      42.99
                                                Prob > chi2       =     0.0000
Log likelihood = -534.36165                     Pseudo R2         =     0.0387

------------------------------------------------------------------------------
      insure |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
Indemnity    |  (base outcome)
-------------+----------------------------------------------------------------
Prepaid      |
         age |   -.011745   .0061946    -1.90   0.058    -.0238862    .0003962
      1.male |   .5616934   .2027465     2.77   0.006     .1643175    .9590693
  1.nonwhite |   .9747768   .2363213     4.12   0.000     .5115955    1.437958
             |
        site |
          2  |   .1130359   .2101903     0.54   0.591    -.2989296    .5250013
          3  |  -.5879879   .2279351    -2.58   0.010    -1.034733   -.1412433
             |
       _cons |   .2697127   .3284422     0.82   0.412    -.3740222    .9134476
-------------+----------------------------------------------------------------
Uninsure     |
         age |  -.0077961   .0114418    -0.68   0.496    -.0302217    .0146294
      1.male |   .4518496   .3674867     1.23   0.219     -.268411     1.17211
  1.nonwhite |   .2170589   .4256361     0.51   0.610    -.6171725     1.05129
             |
        site |
          2  |  -1.211563   .4705127    -2.57   0.010    -2.133751   -.2893747
          3  |  -.2078123   .3662926    -0.57   0.570    -.9257327     .510108
             |
       _cons |  -1.286943   .5923219    -2.17   0.030    -2.447872   -.1260134
------------------------------------------------------------------------------

. margins, eyex(age) predict(outcome(1)) 

Average marginal effects                        Number of obs     =        615
Model VCE    : OIM

Expression   : Pr(insure==Indemnity), predict(outcome(1))
ey/ex w.r.t. : age

------------------------------------------------------------------------------
             |            Delta-method
             |      ey/ex   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .2560951   .1330162     1.93   0.054    -.0046119    .5168021
------------------------------------------------------------------------------

. margins, eyex(age)

Average marginal effects                        Number of obs     =        615
Model VCE    : OIM

ey/ex w.r.t. : age
1._predict   : Pr(insure==Indemnity), predict(pr outcome(1))
2._predict   : Pr(insure==Prepaid), predict(pr outcome(2))
3._predict   : Pr(insure==Uninsure), predict(pr outcome(3))

------------------------------------------------------------------------------
             |            Delta-method
             |      ey/ex   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
age          |
    _predict |
          1  |   .2560951   .1330162     1.93   0.054    -.0046119    .5168021
          2  |  -.2661848    .152332    -1.75   0.081    -.5647501    .0323805
          3  |   -.090586   .4576158    -0.20   0.843    -.9874964    .8063244
------------------------------------------------------------------------------

. margins, eydx(male)

Average marginal effects                        Number of obs     =        615
Model VCE    : OIM

ey/dx w.r.t. : 1.male
1._predict   : Pr(insure==Indemnity), predict(pr outcome(1))
2._predict   : Pr(insure==Prepaid), predict(pr outcome(2))
3._predict   : Pr(insure==Uninsure), predict(pr outcome(3))

------------------------------------------------------------------------------
             |            Delta-method
             |      ey/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.male       |
    _predict |
          1  |  -.3040742    .115607    -2.63   0.009    -.5306598   -.0774886
          2  |   .2576192   .0958708     2.69   0.007     .0697159    .4455226
          3  |   .1477754   .3243217     0.46   0.649    -.4878834    .7834342
------------------------------------------------------------------------------
Note: ey/dx for factor levels is the discrete change from the base level.

For interpretation, a 1% increase in age gives you .25% increase in the probability of choosing indemnity insurance (inelastic). Being male lowers the that probability by almost a third.

Personally, I find this much easier to interpret:

. margins, dyex(age) predict(outcome(1)) 

Average marginal effects                        Number of obs     =        615
Model VCE    : OIM

Expression   : Pr(insure==Indemnity), predict(outcome(1))
dy/ex w.r.t. : age

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/ex   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |    .118398   .0622018     1.90   0.057    -.0035153    .2403113
------------------------------------------------------------------------------

. margins, dydx(age) predict(outcome(1)) 

Average marginal effects                        Number of obs     =        615
Model VCE    : OIM

Expression   : Pr(insure==Indemnity), predict(outcome(1))
dy/dx w.r.t. : age

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0026655    .001399     1.91   0.057    -.0000765    .0054074
------------------------------------------------------------------------------

This says that 1% increase in age boost the probability of indemnity insurance by 11.8 percentage points and being a year older would give you a 2/10 of percentage point boost.

Let make sure the former agrees with what we did before. The baseline for indemnity is:

. margins, predict(outcome(1)) 

Predictive margins                              Number of obs     =        615
Model VCE    : OIM

Expression   : Pr(insure==Indemnity), predict(outcome(1))

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   .4764228   .0197054    24.18   0.000      .437801    .5150446
------------------------------------------------------------------------------

An 11.8 percentage point increase expressed as percentage change over the baseline is .118/.476=.24789916, which is pretty close to 0.25.

Best Answer

Related Solutions

Solved – Multinomial logit: mlogit vs statsmodels

Solved – Elasticities for Multinomial Logit Model (Stata)

Related Question