Solved – How to correctly account for country effects in logistic regression

categorical-encodinglogisticmany-categoriesregressionstata

I use a database with entries at firm-level in 12 countries in 2008. I try to estimate innovation (0/1) based on few firm-level variables. I also want to see if / how much innovation is also due to country-level effects. Thus I want to control for country effects. If I introduce i.country in my logistic regression I get negative z values for each country. I feel this is not right because when I look at data, only one country has 0 for innovation more frequently than 1.

Countries take values as 52, 54, 55.. and 92
Bellow is a split of innovation responses by firm-countries. I tries two things: one is to have i.country in regression and other is to use dummies. I created dummies for countries and I introduced them all in regression. Which is correct and how I interpret this?

. tabulate Country INNOV

           | NEW PROD LAST 3 yr?

   Country |         0          1 |     Total
-----------+----------------------+----------
        52 |         4         28 |        32 
        54 |        25         48 |        73 
        55 |        40         48 |        88 
        58 |        40         96 |       136 
        59 |         4         40 |        44 
        60 |        14         29 |        43 
        61 |        39         55 |        94 
        62 |        35         47 |        82 
        75 |        10         54 |        64 
        78 |        28         51 |        79 
        90 |        29        138 |       167 
        92 |       105         69 |       174 
-----------+----------------------+----------
     Total |       373        703 |     1,076 

Here I look by one country no independent variables. The odds of innovation if country is 90 (Germany) is positive. If I repeat this country by country, only 92 gets z as negative


. logistic INNOV if Country==90

Logistic regression                               Number of obs   =        167
                                                  LR chi2(0)      =      -0.00
                                                  Prob > chi2     =          .
Log likelihood = -77.092379                       Pseudo R2       =    -0.0000

------------------------------------------------------------------------------
       INNOV | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   4.758621   .9720773     7.64   0.000       3.1886    7.101696
------------------------------------------------------------------------------

Here I run regression with one independent variable and while controlling (??) for country effects .. z values for countries are negative (why?)


. logistic INNOV i.Country Mang_MNEexperience 

Logistic regression                               Number of obs   =        481
                                                  LR chi2(12)     =      61.89
                                                  Prob > chi2     =     0.0000
Log likelihood = -283.25686                       Pseudo R2       =     0.0985

------------------------------------------------------------------------------------
             INNOV | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------------+----------------------------------------------------------------
           Country |
               54  |   .3689431   .3062865    -1.20   0.230     .0724962    1.877602
               55  |   .2295013   .1909388    -1.77   0.077     .0449375    1.172091
               58  |   .4457037   .3627449    -0.99   0.321      .090423    2.196917
               59  |   1.689363   1.597736     0.55   0.579     .2646602    10.78344
               60  |   .7459045   .7228328    -0.30   0.762     .1116376    4.983748
               61  |   .1580636   .1313537    -2.22   0.026     .0310076    .8057415
               62  |   .3256028   .2674703    -1.37   0.172     .0650816    1.628988
               75  |   .9975062    1.10341    -0.00   0.998     .1141151    8.719431
               78  |   .6885038   .6454499    -0.40   0.691     .1096308    4.323944
               90  |   .5391077   .4787809    -0.70   0.487     .0945637    3.073453
               92  |   .0549765   .0542165    -2.94   0.003     .0079569    .3798489
                   |
Mang_MNEexperience |   1.083192   .0309218     2.80   0.005      1.02425    1.145525
             _cons |   4.357274   3.409211     1.88   0.060     .9401977    20.19345
------------------------------------------------------------------------------------

Here I use dummies to control for countries

. logistic INNOV countrydummy1 countrydummy2 countrydummy3 countrydummy4 countrydummy5 countrydummy6 
> countrydummy7 countrydummy8 countrydummy9 countrydummy10 countrydummy11 countrydummy12 Mang_MNEexpe
> rience 
note: countrydummy12 omitted because of collinearity

Logistic regression                               Number of obs   =        481
                                                  LR chi2(12)     =      61.89
                                                  Prob > chi2     =     0.0000
Log likelihood = -283.25686                       Pseudo R2       =     0.0985

------------------------------------------------------------------------------------
             INNOV | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------------+----------------------------------------------------------------
     countrydummy1 |   18.18959   17.93813     2.94   0.003     2.632625    125.6773
     countrydummy2 |   6.710925   4.467707     2.86   0.004     1.820148    24.74333
     countrydummy3 |   4.174535    2.79421     2.13   0.033     1.124241     15.5009
     countrydummy4 |   8.107169   5.216314     3.25   0.001     2.297149    28.61207
     countrydummy5 |   30.72882   24.17314     4.35   0.000     6.575663    143.5993
     countrydummy6 |    13.5677   11.30987     3.13   0.002     2.648225    69.51165
     countrydummy7 |   2.875114   1.907619     1.59   0.111     .7832285    10.55411
     countrydummy8 |   5.922583   3.865446     2.73   0.006     1.648026    21.28425
     countrydummy9 |   18.14423   17.87654     2.94   0.003     2.630846    125.1358
    countrydummy10 |   12.52361   9.977246     3.17   0.002     2.627836    59.68436
    countrydummy11 |    9.80615   6.794078     3.30   0.001     2.522048    38.12797
    countrydummy12 |          1  (omitted)
Mang_MNEexperience |   1.083192   .0309218     2.80   0.005      1.02425    1.145525
             _cons |   .2395476   .1453217    -2.36   0.018     .0729475    .7866357
------------------------------------------------------------------------------------

  1. Which way is the correct one?
  2. Why in using i.country z is negative and in using dummies z is positive
  3. How do I interpret country effects?

Best Answer

It's the same model. Look at the overall statistics: the log-likelihood, the corresponding chi-square statistic, and so forth.

The difference is an accident of which indicator (dummy) is omitted, as at least one of them must be. In the first output, it's country 52; in the second output it's country 92.

So, there isn't really any issue about which model is correct. A model call using Stata's factor variable notation is, at a minimum, a better way to do it in the important sense that the output is easier to read. (The names of the country indicators are under your control: even if you don't like the names some command gives them, you can always rename them.)

So, country by country results can't be paired off: they are relative to different base levels.

Stata's factor variable notation includes an option to set the base level.

@kjetil b halvorsen's general warnings about over-dispersion, Hauck-Donner and the like are well meant and you'd do well to keep them in mind, but they don't seem to bear directly on the question.

Related Question