Logistic – How to Interpret Margins in Percentage Points When the Independent Variable Is a Percentage

logisticmarginal-effectstata

I have 2 research questions, dependent variable of 1st question is a binary variable and I use logit regression to estimate it on STATA. The dependent variable of second research question is in percentage (proportional variable, between 0 and 1 including 0 and 1). I use fractional response regression on Stata. In both research questions, my key explanatory variable is expressed in percentage. I am a bit confused and need help in interpreting the coefficient.

I have read that if both dependent and independent variable is expressed in percentage, then we can interpret the coefficient as percentage point. i estimated the margins after fractional regression using margins dydx command on STATA, if the coefficient is -0.5, then we interpret it as with one percentage point increase in X, Y decreases by 0.5 percentage points. Am I correct?

Secondly, in the case of logit regression, where dependent variable is a binary variable 0 or 1, I am predicting likelihood of an event. So, 1 = if event occurs and 0 otherwise. I am interested in interpreting the margins in percentage points. I estimated the margins after logit regression using margins dydx command on Stata. The coefficient is -0.88. How do I interpret this coefficient? Is it correct to say with a percentage point increase in X, the likelihood of event reduces by 0.88 percentage points? I am confused if it is to be interpreted as 88 percentage points or 0.88 percentage points?

Can someone explain this in simple terms?

Best Answer

In both of these models, your outcome and explanatory variable of interest lie in [0,1]. When you calculate the average marginal effect, you are getting the average change in the outcome associated with a 1-unit increase in the explanatory variable. A one-unit change in X is very large if X is between 0 and 1 and may not even make sense: what does going from 0.8 to 1.8 even mean when 1 is the max?

This means you might want to either

divide the output of margins by 100.
rescale your explanatory variables to lie in [0, 100] so that a one-unit change corresponds to 1 percentage point
use an increase between 100 pp and 1 pp with (1) or (2)

Here is an example where the outcome is the participation rate in the 401(k) plan at 4,075 firms in the US. This is a type of retirement savings account in the United States where employees have to opt in or can opt-out if enrollment is automatic.

The explanatory variable of interest is the employer match rate per dollar saved by the employee: 0.5 means the match is 50 cents for every dollar saved, 0 means the employer does not match anything, and 1 means 1:1 match, a doubling. Values above 1 are possible here.

You would expect to see a higher enrollment rate in firms where the employers match more generously. That is exactly what we see:

. webuse 401k, clear
(Firm-level data on 401k participation)

. generate mrate100 = mrate*100

. summarize prate mrate

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
       prate |      4,075     .840607    .1874841   .0036364          1
       mrate |      4,075     .463519    .4187388          0          2

. fracreg logit prate c.mrate c.ltotemp##c.ltotemp c.age##c.age i.sole, nolog


Fractional logistic regression                          Number of obs =  4,075
                                                        Wald chi2(6)  = 817.73
                                                        Prob > chi2   = 0.0000
Log pseudolikelihood = -1673.5566                       Pseudo R2     = 0.0638

-------------------------------------------------------------------------------------
                    |               Robust
              prate | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
--------------------+----------------------------------------------------------------
              mrate |   1.143516    .074748    15.30   0.000     .9970125    1.290019
            ltotemp |  -1.103275   .1130667    -9.76   0.000    -1.324882   -.8816687
                    |
c.ltotemp#c.ltotemp |   .0565782   .0072883     7.76   0.000     .0422934     .070863
                    |
                age |   .0512643   .0059399     8.63   0.000     .0396223    .0629064
                    |
        c.age#c.age |  -.0005891   .0001645    -3.58   0.000    -.0009114   -.0002667
                    |
               sole |
         Only plan  |   .1137479   .0507762     2.24   0.025     .0142284    .2132674
              _cons |   5.747761   .4294386    13.38   0.000     4.906077    6.589445
-------------------------------------------------------------------------------------

. margins, dydx(mrate) post // coeflegend

Average marginal effects                                 Number of obs = 4,075
Model VCE: Robust

Expression: Conditional mean of prate, predict()
dy/dx wrt:  mrate

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
       mrate |   .1450106   .0094558    15.34   0.000     .1264776    .1635436
------------------------------------------------------------------------------

. nlcom ame_rescaled:_b[mrate]/100

ame_rescaled: _b[mrate]/100

------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
ame_rescaled |   .0014501   .0000946    15.34   0.000     .0012648    .0016354
------------------------------------------------------------------------------

. quietly fracreg logit prate c.mrate100 c.ltotemp##c.ltotemp c.age##c.age i.sole, nolog

. margins, dydx(mrate100)

Average marginal effects                                 Number of obs = 4,075
Model VCE: Robust

Expression: Conditional mean of prate, predict()
dy/dx wrt:  mrate100

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
    mrate100 |   .0014501   .0000946    15.34   0.000     .0012648    .0016354
------------------------------------------------------------------------------

The interpretation of the first margins is that the participation rate increases by 0.145 (14.5 percentage points) when the match rate increases by 1. Since the baseline is 0.84, that basically means everyone is expected to participate after the change. But that's a very large change since the mean match rate is 0.46, and the max is 2. Less than 5% of firms match at that new rate.

I next implement the two approaches I suggested. If we consider a one-penny increase in the match rate, then participation only increases by 0.00145, or 1/10th of 1 percentage point. Equivalently, an increase of 10 cents is associated with a 1.45 percentage point higher participation. Generally, you would expect all of these to be close, but that not always the case when things are very nonlinear.

The logit case is identical to the fractional regression, so I will omit that.

Related Solutions

Solved – Interpretation of odds ratio when outcome is a percentage

In a school with a 1-unit higher X value a randomly picked student has 0.9 times the odds of having been unexcused from class at least once, compared to an other school. (Assuming that all other student and school characteristics modelled are the same.)

Solved – Is it possible to interpret standardized beta coefficients for quantile regression

Yes, that is the interpretation. One way in which you can see this is by predicting the median for different values of your standardized, each 1 unit (in this case standard deviation) appart. Than you can look at how much these predicted medians differ, and you will see that that is exactly the same number as your standardized quantile regression coefficient. Here is an example:

. sysuse auto, clear
(1978 Automobile Data)

. 
. // standardize variables
. sum price if !missing(price,weight)

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
       price |        74    6165.257    2949.496       3291      15906

. gen double z_price = ( price - r(mean) ) / r(sd)

. 
. sum weight if !missing(price,weight)

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
      weight |        74    3019.459    777.1936       1760       4840

. gen double z_weight = ( weight - r(mean) ) / r(sd)

. 
. // estimate the quartile regression
. qreg z_price z_weight
Iteration  1:  WLS sum of weighted deviations =  47.263794

Iteration  1: sum of abs. weighted deviations =  54.018868
Iteration  2: sum of abs. weighted deviations =  43.851751

Median regression                                    Number of obs =        74
  Raw sum of deviations 48.21332 (about -.41744651)
  Min sum of deviations 43.85175                     Pseudo R2     =    0.0905

------------------------------------------------------------------------------
     z_price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    z_weight |   .2552875   .1368752     1.87   0.066    -.0175682    .5281432
       _cons |  -.3415908   .1359472    -2.51   0.014    -.6125966    -.070585
------------------------------------------------------------------------------

. 
. // predict the predicted median for z_weight
. // is -2, -1, 0, 1, 2
. drop _all

. set obs 5
obs was 0, now 5

. gen z_weight = _n - 3

. predict med
(option xb assumed; fitted values)

. list

     +----------------------+
     | z_weight         med |
     |----------------------|
  1. |       -2   -.8521658 |
  2. |       -1   -.5968783 |
  3. |        0   -.3415908 |
  4. |        1   -.0863033 |
  5. |        2    .1689841 |
     +----------------------+

. 
. // compute how much the predicted median
. // differs between cars 1 standard deviation
. // weight apart
. gen diff = med - med[_n - 1]
(1 missing value generated)

. list

     +---------------------------------+
     | z_weight         med       diff |
     |---------------------------------|
  1. |       -2   -.8521658          . |
  2. |       -1   -.5968783   .2552875 |
  3. |        0   -.3415908   .2552875 |
  4. |        1   -.0863033   .2552875 |
  5. |        2    .1689841   .2552875 |
     +---------------------------------+

Best Answer

Related Solutions

Solved – Interpretation of odds ratio when outcome is a percentage

Solved – Is it possible to interpret standardized beta coefficients for quantile regression

Related Question