Logistic – How to Interpret Margins in Percentage Points When the Independent Variable Is a Percentage

logisticmarginal-effectstata

I have 2 research questions, dependent variable of 1st question is a binary variable and I use logit regression to estimate it on STATA. The dependent variable of second research question is in percentage (proportional variable, between 0 and 1 including 0 and 1). I use fractional response regression on Stata. In both research questions, my key explanatory variable is expressed in percentage. I am a bit confused and need help in interpreting the coefficient.

I have read that if both dependent and independent variable is expressed in percentage, then we can interpret the coefficient as percentage point. i estimated the margins after fractional regression using margins dydx command on STATA, if the coefficient is -0.5, then we interpret it as with one percentage point increase in X, Y decreases by 0.5 percentage points. Am I correct?

Secondly, in the case of logit regression, where dependent variable is a binary variable 0 or 1, I am predicting likelihood of an event. So, 1 = if event occurs and 0 otherwise. I am interested in interpreting the margins in percentage points. I estimated the margins after logit regression using margins dydx command on Stata. The coefficient is -0.88. How do I interpret this coefficient? Is it correct to say with a percentage point increase in X, the likelihood of event reduces by 0.88 percentage points? I am confused if it is to be interpreted as 88 percentage points or 0.88 percentage points?

Can someone explain this in simple terms?

Best Answer

In both of these models, your outcome and explanatory variable of interest lie in [0,1]. When you calculate the average marginal effect, you are getting the average change in the outcome associated with a 1-unit increase in the explanatory variable. A one-unit change in X is very large if X is between 0 and 1 and may not even make sense: what does going from 0.8 to 1.8 even mean when 1 is the max?

This means you might want to either

  1. divide the output of margins by 100.
  2. rescale your explanatory variables to lie in [0, 100] so that a one-unit change corresponds to 1 percentage point
  3. use an increase between 100 pp and 1 pp with (1) or (2)

Here is an example where the outcome is the participation rate in the 401(k) plan at 4,075 firms in the US. This is a type of retirement savings account in the United States where employees have to opt in or can opt-out if enrollment is automatic.

The explanatory variable of interest is the employer match rate per dollar saved by the employee: 0.5 means the match is 50 cents for every dollar saved, 0 means the employer does not match anything, and 1 means 1:1 match, a doubling. Values above 1 are possible here.

You would expect to see a higher enrollment rate in firms where the employers match more generously. That is exactly what we see:

. webuse 401k, clear
(Firm-level data on 401k participation)

. generate mrate100 = mrate*100

. summarize prate mrate

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
       prate |      4,075     .840607    .1874841   .0036364          1
       mrate |      4,075     .463519    .4187388          0          2

. fracreg logit prate c.mrate c.ltotemp##c.ltotemp c.age##c.age i.sole, nolog


Fractional logistic regression                          Number of obs =  4,075
                                                        Wald chi2(6)  = 817.73
                                                        Prob > chi2   = 0.0000
Log pseudolikelihood = -1673.5566                       Pseudo R2     = 0.0638

-------------------------------------------------------------------------------------
                    |               Robust
              prate | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
--------------------+----------------------------------------------------------------
              mrate |   1.143516    .074748    15.30   0.000     .9970125    1.290019
            ltotemp |  -1.103275   .1130667    -9.76   0.000    -1.324882   -.8816687
                    |
c.ltotemp#c.ltotemp |   .0565782   .0072883     7.76   0.000     .0422934     .070863
                    |
                age |   .0512643   .0059399     8.63   0.000     .0396223    .0629064
                    |
        c.age#c.age |  -.0005891   .0001645    -3.58   0.000    -.0009114   -.0002667
                    |
               sole |
         Only plan  |   .1137479   .0507762     2.24   0.025     .0142284    .2132674
              _cons |   5.747761   .4294386    13.38   0.000     4.906077    6.589445
-------------------------------------------------------------------------------------

. margins, dydx(mrate) post // coeflegend

Average marginal effects                                 Number of obs = 4,075
Model VCE: Robust

Expression: Conditional mean of prate, predict()
dy/dx wrt:  mrate

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
       mrate |   .1450106   .0094558    15.34   0.000     .1264776    .1635436
------------------------------------------------------------------------------

. nlcom ame_rescaled:_b[mrate]/100

ame_rescaled: _b[mrate]/100

------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
ame_rescaled |   .0014501   .0000946    15.34   0.000     .0012648    .0016354
------------------------------------------------------------------------------

. quietly fracreg logit prate c.mrate100 c.ltotemp##c.ltotemp c.age##c.age i.sole, nolog

. margins, dydx(mrate100)

Average marginal effects                                 Number of obs = 4,075
Model VCE: Robust

Expression: Conditional mean of prate, predict()
dy/dx wrt:  mrate100

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
    mrate100 |   .0014501   .0000946    15.34   0.000     .0012648    .0016354
------------------------------------------------------------------------------

The interpretation of the first margins is that the participation rate increases by 0.145 (14.5 percentage points) when the match rate increases by 1. Since the baseline is 0.84, that basically means everyone is expected to participate after the change. But that's a very large change since the mean match rate is 0.46, and the max is 2. Less than 5% of firms match at that new rate.

I next implement the two approaches I suggested. If we consider a one-penny increase in the match rate, then participation only increases by 0.00145, or 1/10th of 1 percentage point. Equivalently, an increase of 10 cents is associated with a 1.45 percentage point higher participation. Generally, you would expect all of these to be close, but that not always the case when things are very nonlinear.

The logit case is identical to the fractional regression, so I will omit that.