I use a database with entries at firm-level in 12 countries in 2008. I try to estimate innovation (0/1) based on few firm-level variables. I also want to see if / how much innovation is also due to country-level effects. Thus I want to control for country effects. If I introduce i.country in my logistic regression I get negative z values for each country. I feel this is not right because when I look at data, only one country has 0 for innovation more frequently than 1.
Countries take values as 52, 54, 55.. and 92
Bellow is a split of innovation responses by firm-countries. I tries two things: one is to have i.country in regression and other is to use dummies. I created dummies for countries and I introduced them all in regression. Which is correct and how I interpret this?
. tabulate Country INNOV
| NEW PROD LAST 3 yr?
Country | 0 1 | Total
-----------+----------------------+----------
52 | 4 28 | 32
54 | 25 48 | 73
55 | 40 48 | 88
58 | 40 96 | 136
59 | 4 40 | 44
60 | 14 29 | 43
61 | 39 55 | 94
62 | 35 47 | 82
75 | 10 54 | 64
78 | 28 51 | 79
90 | 29 138 | 167
92 | 105 69 | 174
-----------+----------------------+----------
Total | 373 703 | 1,076
Here I look by one country no independent variables. The odds of innovation if country is 90 (Germany) is positive. If I repeat this country by country, only 92 gets z as negative
. logistic INNOV if Country==90
Logistic regression Number of obs = 167
LR chi2(0) = -0.00
Prob > chi2 = .
Log likelihood = -77.092379 Pseudo R2 = -0.0000
------------------------------------------------------------------------------
INNOV | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | 4.758621 .9720773 7.64 0.000 3.1886 7.101696
------------------------------------------------------------------------------
Here I run regression with one independent variable and while controlling (??) for country effects .. z values for countries are negative (why?)
. logistic INNOV i.Country Mang_MNEexperience
Logistic regression Number of obs = 481
LR chi2(12) = 61.89
Prob > chi2 = 0.0000
Log likelihood = -283.25686 Pseudo R2 = 0.0985
------------------------------------------------------------------------------------
INNOV | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------------+----------------------------------------------------------------
Country |
54 | .3689431 .3062865 -1.20 0.230 .0724962 1.877602
55 | .2295013 .1909388 -1.77 0.077 .0449375 1.172091
58 | .4457037 .3627449 -0.99 0.321 .090423 2.196917
59 | 1.689363 1.597736 0.55 0.579 .2646602 10.78344
60 | .7459045 .7228328 -0.30 0.762 .1116376 4.983748
61 | .1580636 .1313537 -2.22 0.026 .0310076 .8057415
62 | .3256028 .2674703 -1.37 0.172 .0650816 1.628988
75 | .9975062 1.10341 -0.00 0.998 .1141151 8.719431
78 | .6885038 .6454499 -0.40 0.691 .1096308 4.323944
90 | .5391077 .4787809 -0.70 0.487 .0945637 3.073453
92 | .0549765 .0542165 -2.94 0.003 .0079569 .3798489
|
Mang_MNEexperience | 1.083192 .0309218 2.80 0.005 1.02425 1.145525
_cons | 4.357274 3.409211 1.88 0.060 .9401977 20.19345
------------------------------------------------------------------------------------
Here I use dummies to control for countries
. logistic INNOV countrydummy1 countrydummy2 countrydummy3 countrydummy4 countrydummy5 countrydummy6
> countrydummy7 countrydummy8 countrydummy9 countrydummy10 countrydummy11 countrydummy12 Mang_MNEexpe
> rience
note: countrydummy12 omitted because of collinearity
Logistic regression Number of obs = 481
LR chi2(12) = 61.89
Prob > chi2 = 0.0000
Log likelihood = -283.25686 Pseudo R2 = 0.0985
------------------------------------------------------------------------------------
INNOV | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------------+----------------------------------------------------------------
countrydummy1 | 18.18959 17.93813 2.94 0.003 2.632625 125.6773
countrydummy2 | 6.710925 4.467707 2.86 0.004 1.820148 24.74333
countrydummy3 | 4.174535 2.79421 2.13 0.033 1.124241 15.5009
countrydummy4 | 8.107169 5.216314 3.25 0.001 2.297149 28.61207
countrydummy5 | 30.72882 24.17314 4.35 0.000 6.575663 143.5993
countrydummy6 | 13.5677 11.30987 3.13 0.002 2.648225 69.51165
countrydummy7 | 2.875114 1.907619 1.59 0.111 .7832285 10.55411
countrydummy8 | 5.922583 3.865446 2.73 0.006 1.648026 21.28425
countrydummy9 | 18.14423 17.87654 2.94 0.003 2.630846 125.1358
countrydummy10 | 12.52361 9.977246 3.17 0.002 2.627836 59.68436
countrydummy11 | 9.80615 6.794078 3.30 0.001 2.522048 38.12797
countrydummy12 | 1 (omitted)
Mang_MNEexperience | 1.083192 .0309218 2.80 0.005 1.02425 1.145525
_cons | .2395476 .1453217 -2.36 0.018 .0729475 .7866357
------------------------------------------------------------------------------------
- Which way is the correct one?
- Why in using i.country z is negative and in using dummies z is positive
- How do I interpret country effects?
Best Answer
It's the same model. Look at the overall statistics: the log-likelihood, the corresponding chi-square statistic, and so forth.
The difference is an accident of which indicator (dummy) is omitted, as at least one of them must be. In the first output, it's country 52; in the second output it's country 92.
So, there isn't really any issue about which model is correct. A model call using Stata's factor variable notation is, at a minimum, a better way to do it in the important sense that the output is easier to read. (The names of the country indicators are under your control: even if you don't like the names some command gives them, you can always
rename
them.)So, country by country results can't be paired off: they are relative to different base levels.
Stata's factor variable notation includes an option to set the base level.
@kjetil b halvorsen's general warnings about over-dispersion, Hauck-Donner and the like are well meant and you'd do well to keep them in mind, but they don't seem to bear directly on the question.