Solved – Binary panel logistic regression (xtlogit fixed effects) is not converging in Stata, how to resolve

convergencelogisticpanel datastata

I have a panel dataset with a sample of 800 groups, each having between 200-500 observations. The data looks like this:

enter image description here

The dependent variable is binomial: close_gp30_f30.

The independent variables are continous growth rates. An example summary of one of these is:

                          close_g1
-------------------------------------------------------------
      Percentiles      Smallest
 1%    -.0789325      -.9908884
 5%    -.0396762      -.9907975
10%    -.0256917       -.990625       Obs             2993902
25%    -.0096911      -.9904597       Sum of Wgt.     2993902

50%            0                      Mean           .0015124
                        Largest       Std. Dev.      .3472223
75%      .009676         103.25
90%     .0253968       103.3333       Variance       .1205633
95%     .0399516          104.5       Skewness       436.1726
99%     .0841585       309.3899       Kurtosis       266732.5

I would like to run this experimental regression:

xtset ticker_id date
xtlogit close_gp30_f30 close_g1 close_g10 close_g15 close_g30 close_g60 close_g120 if ticker_grp == 0, fe

However, when I add more than about 5 variables the regression never converges and I seem to get stuck in a loop of "backed up iterations", like so:

. xtlogit close_gp30_f30 close_g1 close_g10 close_g15 close_g30 close_g60 close_g120 if ticker_grp == 0, fe
note: multiple positive outcomes within groups encountered.
note: 11 groups (272 obs) dropped because of all positive or
      all negative outcomes.

Iteration 0:   log likelihood = -175837.76  
Iteration 1:   log likelihood = -175015.93  
Iteration 2:   log likelihood = -175006.84  
Iteration 3:   log likelihood =  -175002.6  
Iteration 4:   log likelihood = -175001.69  (backed up)
Iteration 5:   log likelihood = -175001.69  (backed up)
Iteration 6:   log likelihood = -175001.69  (backed up)
Iteration 7:   log likelihood = -175001.69  (backed up)
Iteration 8:   log likelihood = -175001.69  (backed up)
Iteration 9:   log likelihood = -175001.69  (backed up)
Iteration 10:  log likelihood = -175001.69  (backed up)
Iteration 11:  log likelihood = -175001.69  (backed up)
Iteration 12:  log likelihood = -175001.69  (backed up)
Iteration 13:  log likelihood = -175001.69  (backed up)
Iteration 14:  log likelihood = -175001.69  (backed up)
Iteration 15:  log likelihood = -175001.69  (backed up)
Iteration 16:  log likelihood = -175001.69  (backed up)
Iteration 17:  log likelihood = -175001.69  (backed up)
Iteration 18:  log likelihood = -175001.69  (backed up)
Iteration 19:  log likelihood = -175001.69  (backed up)
Iteration 20:  log likelihood = -175001.69  (backed up)
Iteration 21:  log likelihood = -175001.69  (backed up)
Iteration 22:  log likelihood = -175001.69  (backed up)
Iteration 23:  log likelihood = -175001.69  (backed up)
Iteration 24:  log likelihood = -175001.69  (backed up)
Iteration 25:  log likelihood = -175001.69  (backed up)
Iteration 26:  log likelihood = -175001.69  (backed up)
Iteration 27:  log likelihood = -175001.69  (backed up)
Iteration 28:  log likelihood = -175001.69  (backed up)
Iteration 29:  log likelihood = -175001.69  (backed up)
Iteration 30:  log likelihood = -175001.69  (backed up)
Iteration 31:  log likelihood = -175001.69  (backed up)
Iteration 32:  log likelihood = -175001.69  (backed up)
Iteration 33:  log likelihood = -175001.69  (backed up)
Iteration 34:  log likelihood = -175001.69  (backed up)
Iteration 35:  log likelihood = -175001.69  (backed up)
Iteration 36:  log likelihood = -175001.69  (backed up)
--Break--
r(1);

I have also rerun this regression with all debugging information enabled, this is a lot of information but may provide the answer on why it is not converging. Note that here I regressed on the standardized values of the independent variables, but this had exactly the same effect (for some reason I hoped that it would solve my problem).

http://pastebin.com/YU0EEkJt

My main questions are:

  1. Why is it not converging?
  2. How can I resolve this situation?

Update: multicollinearity checks

. collin close_g1 close_g3 close_g5 close_g15 close_g30 close_g60 close_g70 close_g80 close_g90 close_g100
(obs=3146949)

  Collinearity Diagnostics

                        SQRT                   R-
  Variable      VIF     VIF    Tolerance    Squared
----------------------------------------------------
  close_g1      1.17    1.08    0.8564      0.1436
  close_g3      1.22    1.11    0.8169      0.1831
  close_g5      1.25    1.12    0.7979      0.2021
 close_g15      1.35    1.16    0.7396      0.2604
 close_g30      1.48    1.22    0.6767      0.3233
 close_g60      1.87    1.37    0.5343      0.4657
 close_g70      1.97    1.40    0.5074      0.4926
 close_g80      2.06    1.43    0.4857      0.5143
 close_g90      2.03    1.43    0.4915      0.5085
close_g100      1.89    1.38    0.5280      0.4720
----------------------------------------------------
  Mean VIF      1.63

                           Cond
        Eigenval          Index
---------------------------------
    1     4.0898          1.0000
    2     1.4492          1.6799
    3     0.9965          2.0259
    4     0.7954          2.2675
    5     0.7143          2.3928
    6     0.6716          2.4677
    7     0.5801          2.6552
    8     0.5096          2.8328
    9     0.4174          3.1303
    10     0.3970          3.2096
    11     0.3791          3.2847
---------------------------------
 Condition Number         3.2847 
 Eigenvalues & Cond Index computed from scaled raw sscp (w/ intercept)
 Det(correlation matrix)    0.0422

This too does not seem to be a problem.

Update 2: with gradient option and modified "limits":

. xtlogit close_gp30_f30 close_g1 close_g10 close_g15 close_g30 close_g60 close_g80 close_g100 if ticker_grp == 0, fe ltol(0) tol(1e-7) gradi
> ent
note: multiple positive outcomes within groups encountered.
note: 10 groups (240 obs) dropped because of all positive or
      all negative outcomes.

---------------------------------------------------------------------------------------------------------------------------------------------
Iteration 0:
                                                   log likelihood = -186484.82
Gradient vector (length = 4065.469):
     close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:
        close_g1     close_g10     close_g15     close_g30     close_g60     close_g80    close_g100
r1     -145.9671      -491.122     -628.1631      -548.698     -1291.774     -2406.543     -2847.761
---------------------------------------------------------------------------------------------------------------------------------------------
Iteration 1:
                                                   log likelihood =  -185998.3
Gradient vector (length = 2373.377):
     close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:
        close_g1     close_g10     close_g15     close_g30     close_g60     close_g80    close_g100
r1     -93.28661     -296.7566     -370.1424     -292.3351     -675.5539     -1381.919     -1716.862
---------------------------------------------------------------------------------------------------------------------------------------------
Iteration 2:
                                                   log likelihood = -185954.74
Gradient vector (length = 2226.909):
     close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:
        close_g1     close_g10     close_g15     close_g30     close_g60     close_g80    close_g100
r1     -88.19199     -278.7058      -347.356     -273.7588      -633.336     -1296.948     -1610.864
---------------------------------------------------------------------------------------------------------------------------------------------
Iteration 3:
                                                   log likelihood = -185934.48
Gradient vector (length = 2157.759):
     close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:
        close_g1     close_g10     close_g15     close_g30     close_g60     close_g80    close_g100
r1     -85.68227     -270.0902     -336.5747      -265.113     -613.5868     -1256.742     -1560.826
---------------------------------------------------------------------------------------------------------------------------------------------
Iteration 4:
                                                   log likelihood = -185929.57
                                                                   (backed up)
Gradient vector (length = 2140.928):
     close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:
        close_g1     close_g10     close_g15     close_g30     close_g60     close_g80    close_g100
r1     -85.03234     -267.9852     -333.9495     -263.0359     -608.7964     -1246.943      -1548.65
---------------------------------------------------------------------------------------------------------------------------------------------
Iteration 5:
                                                   log likelihood = -185927.13
                                                                   (backed up)
Gradient vector (length = 2132.452):
     close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:
        close_g1     close_g10     close_g15     close_g30     close_g60     close_g80    close_g100
r1     -84.55817     -266.8118     -332.5234     -262.1772     -606.6421     -1241.936     -1542.494
---------------------------------------------------------------------------------------------------------------------------------------------
Iteration 6:
                                                   log likelihood = -185926.38
                                                                   (backed up)
Gradient vector (length = 2125.423):
     close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:
        close_g1     close_g10     close_g15     close_g30     close_g60     close_g80    close_g100
r1     -79.15747     -261.7075     -327.7218     -267.7253     -612.7412      -1235.74     -1536.582
---------------------------------------------------------------------------------------------------------------------------------------------
Iteration 7:
                                                   log likelihood = -185925.91
                                                                   (backed up)
Gradient vector (length = 2117.104):
     close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:
        close_g1     close_g10     close_g15     close_g30     close_g60     close_g80    close_g100
r1     -72.14189     -254.8101     -321.5761     -276.4171     -622.5654     -1228.224     -1528.416
---------------------------------------------------------------------------------------------------------------------------------------------
Iteration 8:
                                                   log likelihood = -185925.59
                                                                   (backed up)
Gradient vector (length = 2111.886):
     close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:
        close_g1     close_g10     close_g15     close_g30     close_g60     close_g80    close_g100
r1     -67.13769     -250.2303     -317.4594     -280.2288      -626.198      -1222.17     -1525.716
---------------------------------------------------------------------------------------------------------------------------------------------
Iteration 9:
                                                   log likelihood = -185925.59
                                                                   (backed up)
Gradient vector (length = 2111.886):
     close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:
        close_g1     close_g10     close_g15     close_g30     close_g60     close_g80    close_g100
r1     -67.13769     -250.2303     -317.4594     -280.2288      -626.198      -1222.17     -1525.716
---------------------------------------------------------------------------------------------------------------------------------------------
Iteration 10:
                                                   log likelihood = -185925.59
                                                                   (backed up)
Gradient vector (length = 2111.886):
     close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:
        close_g1     close_g10     close_g15     close_g30     close_g60     close_g80    close_g100
r1     -67.13769     -250.2303     -317.4594     -280.2288      -626.198      -1222.17     -1525.716
---------------------------------------------------------------------------------------------------------------------------------------------
Iteration 11:
                                                   log likelihood = -185925.59
                                                                   (backed up)
Gradient vector (length = 2111.886):
     close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:  close_g~f30:
        close_g1     close_g10     close_g15     close_g30     close_g60     close_g80    close_g100
r1     -67.13769     -250.2303     -317.4594     -280.2288      -626.198      -1222.17     -1525.716

Update 3:

I don't know if this helps, but when I do xtdata indepvars, i(ticker_id) fe clear followed by a logit depvar indepvar (which normally worked just fine), the logit seems to get stuck too. I therefore believe that it has something to do with fixed effects and/or panel data. Does this make sense?

Update 4:

This seems to be a problem with Stata's background to double recast, please see my follow-up question xtlogit: panel data transformation's recast to double makes model incomputable (STATA)

Best Answer

There are 2 possibilities. One is that Stata has found a perfect max and cannot get to a better point. This is pretty unlikely, but a fellow can still dream.

The second, and more likely, scenario is that the optimizer wound up in a bad concave part where the computed gradient and Hessian give a bad direction for stepping.

Here are some possible solutions. Use the gradient max option. If the gradient is zero, the optimizer found a max that may not be unique, but is a max. This is a valid result. If the gradient is not zero, that is not a valid result. You can try tightening up the convergence criterion, or try ltol(0) tol(1e-7) to see if the optimizer can work its way out of the bad region.

Also, sometime adding the difficult max option helps.

Related Question