Solved – how to check for robustness for categorical variables in Stata

logitstata

I performed logit regression on my variables. I have 10 variables, all are categorical varibales. After performing logit, I want to check for robustness. How do I got about it in STATA

Best Answer

In general, what econometricians refer to as a "robustness check" is a check on the change of some coefficients when we add or drop covariates. In linear regression models, this is pretty easy.

However, in a logit (or another non-linear probability model), it's actually quite hard because the coefficients change size with the total amount of variation explained in the model.

A solution for this was proposed by the sociologists Holm, Karlson & Breen in SMx 2012, SMR 2013. It is implemented in Stata via the khb command.

Related Solutions

Solved – overall effects of categorical variables

Given your comments I will assume that you do not want an estimate of the size of the effect but instead a statistical test whether the expected (possibly adjusted) count for each of the categories are the same. This may or may not be wise depending on your circumstances, but this is an example of how you do it in Stata:

webuse dollhill3
poisson deaths smokes i.agecat, exposure(pyears)
testparm i.agecat

If you want something like a single effect size you could look into sheaf coefficients. In case of interaction terms this generalizes to a model with parametrically weighted covariates. A brief discussion on how to do those in Stata can be found here.

Solved – How to interpret the marginal effect of a dumthe regressors in a logit model

Stata is smart enough to ignore the at() assignment for x when you calculate the AME for x (since otherwise you would get a zero). In the end, you have asked Stata to calculate this average of finite differences:

$$AME_x =\sum_{i=1}^N \left[ \hat p(x=1,y=1,z=z_i)-\hat p(x=0,y=1,z=z_i) \right],$$

where $\hat p(.)$ is the predicted probability from the logit model. Stata used differences here rather than derivatives since all your regressors are binary/categorical.

This is probably not a very sensible AME, but perhaps you have your reasons for doing it this way. I am calling this an AME, but it is actually a hybrid of AME and MER (marginal effect at representative values).

Here's a toy example showing the margins calculation by hand:

. sysuse auto, clear
(1978 Automobile Data)

. gen high_mpg = mpg>20

. gen high_rep = rep78>3

. gen heavy    = weight>3000

. 
. /* AME usig margins */
. logit foreign i.(high_mpg heavy high_rep), nolog

Logistic regression                             Number of obs     =         74
                                                LR chi2(3)        =      37.57
                                                Prob > chi2       =     0.0000
Log likelihood = -26.246142                     Pseudo R2         =     0.4172

------------------------------------------------------------------------------
     foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  1.high_mpg |  -1.118024   1.307539    -0.86   0.393    -3.680754    1.444706
     1.heavy |  -3.673601   1.417986    -2.59   0.010    -6.452802   -.8944001
  1.high_rep |   2.245017   .7705583     2.91   0.004     .7347502    3.755283
       _cons |  -.2405401   1.332215    -0.18   0.857    -2.851634    2.370554
------------------------------------------------------------------------------

. margins, dydx(high_mpg) at(high_mpg = 1 heavy = 1)

Average marginal effects                        Number of obs     =         74
Model VCE    : OIM

Expression   : Pr(foreign), predict()
dy/dx w.r.t. : 1.high_mpg
at           : high_mpg        =           1
               heavy           =           1

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  1.high_mpg |   -.053257   .0519245    -1.03   0.305     -.155027    .0485131
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

. 
. /* Calculate the same average marginal effect in-sample for high_mpg as above */
. /* (a) ME = phat(high_mpg=1, heavy=1, high_rep at own value) */
. /*        - phat(high_mpg=0, heavy=1, high_rep at own value) */
. gen double high_mpg_me =                                                                                                                        
>                    ///
>                                           [   exp(_b[_cons]+_b[1.high_mpg]+_b[1.heavy]+_b[1.high_rep]*high_rep)/   ///
>                                (1+exp(_b[_cons]+_b[1.high_mpg]+_b[1.heavy]+_b[1.high_rep]*high_rep))]  ///
>                                          -[   exp(_b[_cons]               +_b[1.heavy]+_b[1.high_rep]*high_rep)/   ///
>                                (1+exp(_b[_cons]               +_b[1.heavy]+_b[1.high_rep]*high_rep))]

. 
. /* (b) Calculate the average marginal effect (AME) */
. sum high_mpg_me, meanonly

. display "High MPG AME = " %9.6f r(mean)
High MPG AME = -0.053257

According to this model, when all cars are assumed to be heavy, but have their actual in-sample values of high repair record as they are observed. the probability of the car being foreign falls by 5.3 percentage points when it is high MPG (relative to low MPG).

Stata Code:

cls
sysuse auto, clear
gen high_mpg = mpg>20
gen high_rep = rep78>3
gen heavy    = weight>3000

/* AME usig margins */
logit foreign i.(high_mpg heavy high_rep), nolog
margins, dydx(high_mpg) at(high_mpg = 1 heavy = 1)

/* Calculate the same average marginal effect in-sample for high_mpg as above */
/* (a) ME = phat(high_mpg=1, heavy=1, high_rep at own value) */
/*        - phat(high_mpg=0, heavy=1, high_rep at own value) */
gen double high_mpg_me =                                                                       ///
                      [   exp(_b[_cons]+_b[1.high_mpg]+_b[1.heavy]+_b[1.high_rep]*high_rep)/   ///
                       (1+exp(_b[_cons]+_b[1.high_mpg]+_b[1.heavy]+_b[1.high_rep]*high_rep))]  ///
                     -[   exp(_b[_cons]               +_b[1.heavy]+_b[1.high_rep]*high_rep)/   ///
                       (1+exp(_b[_cons]               +_b[1.heavy]+_b[1.high_rep]*high_rep))]

/* (b) Calculate the average marginal effect (AME) */
sum high_mpg_me, meanonly
di "High MPG AME = " %9.6f r(mean)

Best Answer

Related Solutions

Solved – overall effects of categorical variables

Solved – How to interpret the marginal effect of a dumthe regressors in a logit model

Related Question