Solved – Including several endogenous interaction terms

2slsendogeneityinteractionleast squaresstata

I would like to write you beacause of the following issue: I´m estimating an IV-model with the following common structure: $Y = constant + b1*X1 + b2*X2 + b3*Xend + b..*Xcontrols$. I´ve found also a promising instrumental variable for $Xend$, $Xinstr$. In order to check overall robustness I used the original OLS and OLS vce robust specification and several 2SLS estimators. In general and beside some minor changes in coefficients and significance levels (probably due to the adequacy of IV-Regression) the theoretically hypothesized effects keep in place.

But as soon, as I modify my model to an interaction model:
$Y = constant + b1*X1 + b2*X2 + b3*Xend + b4*(X1*Xend) + b5*(X2*Xend) + b..*controlsX$

some really odd things happen: There is a very notable and thus confusing structural change in the values of coefficients and further significance related statistics between the classical OLS estimators and the several 2SLS estimators. In detail, every prior (in OLS) significant realtionship cancels out (e.g $b1$ $b2$ $b3$ and $b4$) and the coefficients even change signs.

As literature suggested in my first stage equation I´ve used the variable (($Xinstr * X1$) and ($Xinstr * X2$)) as an instrument itself for the newly added endogenous interaction terms (in stata notation e.g. ivregress Y (Xend (Xend*X1) (Xend*X2) = Xinstr. (Xinstr. * X1) (Xinstr. * X2)) X1 X2 Xcontrols).

What is going on here? Why is this change happening?

Here are some actual quick and dirty examples of my work on car sales and marketing strategies (please forgive me the formatting issues; i also shortened the actual output and the variations in estimators in the interest of time).

As you can see in the original regressions (non-interaction) there is no big difference….but in the interaction model the obtained effects via OLS cancel out (especially for the two strategy related variables of main interest).

quietly regress lnsales car_quality marketing_strategy1 marketing_strategy2 
sourcing car_type1 car_type2 (+"List of additional control variables")

estimates store OLS  

quietly regress lnsales car_quality marketing_strategy1 marketing_strategy2
sourcing car_type1 car_type2 (+"List of additional control variables"), robust

estimates store OLS_robust 
global ivmodel lnsales (car_quality = peer_quality) marketing_strategy1   marketing_strategy2 
sourcing car_type1 car_type2 (+"List of additional control variables")
quietly ivregress 2sls $ivmodel
estimates store TwoSLS_def
quietly ivregress 2sls $ivmodel , vce(robust)
estimates store TwoSLS__2
quietly ivregress gmm $ivmodel , wmatrix(robust)
estimates store GMM_het
quietly ivregress gmm $ivmodel , wmatrix(robust) igmm
estimates store IGMM
quietly ivregress liml $ivmodel , vce(robust)
estimates store LIML


estimates table OLS OLS_robust TwoSLS_def TwoSLS__2 GMM_het IGMM LIML, 
b se p stats(N r2)


------------------------------------------------------------------------------
    Variable |        OLS   OLS_robust   TwoSLS_def    TwoSLS__2       GMM_het 
-------------+----------------------------------------------------------------
    car_~y   |  .44455351    .44455351    .44888526    .44888526    .44888526 
             |  .05834619    .07762703    .12372644    .10091798    .10091798 
             |     0.0000       0.0000       0.0003       0.0000       0.0000  
marketing_~1 | -.02134571   -.02134571   -.02261369   -.02261369   -.02261369 
             |  .14387381    .13990431    .13956152    .13548022    .13548022  
             |     0.8822       0.8789       0.8713       0.8674       0.8674 
marketing_~2 | -.34940482   -.34940482    -.3491414    -.3491414    -.3491414   
             |  .15259582    .13431119    .14412673     .1269109     .1269109   
             |     0.0229       0.0099       0.0154       0.0059       0.0059   
sourcing     |  .00599138    .00599138    .00603506    .00603506    .00603506    
             |  .15266332    .14239443    .14403715    .13414465    .13414465  
             |     0.9687       0.9665       0.9666       0.9641       0.9641  
    car_~1   | -.30344565   -.30344565   -.30478088   -.30478088   -.30478088   
             |  .27143962    .26951864    .25836192    .26001529    .26001529  
             |     0.2647       0.2613       0.2381       0.2411       0.2411      
    car_~2   | -.02749295   -.02749295   -.03170655   -.03170655   -.03170655   
             |  .34545754    .39088556    .34328748    .36963657    .36963657    

.......... 
..........
..........

Now the model with interactions…. please note the shifts from OLS to 2sls in the quality and strategy variables

quietly regress lnsales car_quality marketing_strategy1 marketing_strategy2 sourcing car_type1 car_type2 (+"List of additional  control variables")

estimates store OLS

quietly regress lnsales product_quality marketing_strategy1 marketing_strategy2 sourcing car_type1 car_type2 (+"List of additional  control variables"), robust

estimates store OLS_robust

global ivmodel lnsales (c.car_quality c.car_quality#i.marketing_strategy1 c.car_quality#i.marketing_strategy2= c.peer_quality i.marketing_strategy1#c.peer_quality i.marketing_strategy2#c.peer_quality) marketing_strategy1 marketing_strategy2 sourcing car_type1 car_type2(+"List of additional control variables")
quietly ivregress 2sls $ivmodel
estimates store TwoSLS_def
quietly ivregress 2sls $ivmodel , vce(robust)
estimates store TwoSLS__2
quietly ivregress gmm $ivmodel , wmatrix(robust)
estimates store GMM_het

estimates table OLS OLS_robust TwoSLS_def TwoSLS__2 GMM_het IGMM LIML, b se   p stats(N r2)
 ------------------------------------------------------------------------------
    Variable |     OLS       OLS_robust   TwoSLS_def   TwoSLS__2      GMM_het 
-------------+----------------------------------------------------------------
    car_~y   |  .30626371    .30626371    .40466472    .40466472    .40466472  
             |  .06639855    .08737882    .17734552    .14822445    .14822445 
             |     0.0000       0.0005       0.0225       0.0063       0.0063  
             |
marketing_~1 | -2.7663962   -2.7663962    -1.022544    -1.022544    -1.022544 
             |  .87427115    .87740022     3.468728     3.021177     3.021177  
             |     0.0018       0.0018       0.7682       0.7350      0.7350  
             |
marketing_~1#|
    c.car~y  |
          1  |  .40964628    .40964628    .14894708    .14894708    .14894708  
             |  .12788375    .12954421    .51333938    .44914179    .44914179 
             |     0.0015       0.0018       0.7717       0.7402       0.7402 

marketing_~2 | -1.6974189   -1.6974189   -.81075049   -.81075049   -.81075047 
             |  1.2256574    1.0156041    4.4093988    3.5747531    3.5747531 
             |     0.1674       0.0960       0.8541       0.8206       0.8206
             |
marketing_~2#|
    c.car~y  |
          1  |  .20617457    .20617457    .07077817    .07077817    .07077817 
             |  .18004716    .14488011    .65063831    .53051219    .53051219 
             |     0.2533       0.1560       0.9134       0.8939       0.8939 
             |
sourcing     |  .02814061    .02814061    .01454754    .01454754    .01454754 
             |  .15052717    .13857819    .17351094    .14787563    .14787563 
             |     0.8519       0.8393       0.9332       0.9216       0.9216
    car_~1   | -.23592028   -.23592028   -.28205832   -.28205832   -.28205832 
             |  .26452637    .23238727    .26379489    .24610577    .24610577 
             |     0.3734       0.3110       0.2850       0.2518       0.2518
    car_~2   | -.02415081   -.02415081   -.03596115   -.03596115   -.03596115
             |  .33585648    .37759328    .33613488    .36136989    .36136989
             |     0.9427       0.9491       0.9148       0.9207       0.9207
.............
.............
............. 

Best Answer

There could all sorts of things going on, but without knowing more about the details of your model and actual commands and results, it will be hard to say more. Don't show us pseudo-code with generic y and x. No one but you can decipher what Xinstr. (Xinstr. * X1) means. At the very least, show us the actual Stata commands you typed. Also, from the parentheses arrangement in your question, it seems like you share the common misunderstanding that instruments map onto the endogenous variables one to one. That's not how IV works.

Having said that, the first thing I would try is to make sure that you're comparing apples to apples. In the simple model, the IV and OLS coefficients on $X_{end}$ are the marginal effects. In the interactions model, the marginal effects are more complicated and non-linear, so you need to take that into account when comparing. You can't just look at the coefficients.

Here's an example:

. webuse hsng2, clear
(1980 Census housing data)

. ivregress 2sls rent c.pcturban (c.hsngval = faminc i.region)

Instrumental variables (2SLS) regression          Number of obs   =         50
                                                  Wald chi2(2)    =      90.76
                                                  Prob > chi2     =     0.0000
                                                  R-squared       =     0.5989
                                                  Root MSE        =     22.166

------------------------------------------------------------------------------
        rent |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     hsngval |   .0022398   .0003284     6.82   0.000     .0015961    .0028836
    pcturban |    .081516   .2987652     0.27   0.785     -.504053     .667085
       _cons |   120.7065   15.22839     7.93   0.000     90.85942    150.5536
------------------------------------------------------------------------------
Instrumented:  hsngval
Instruments:   pcturban faminc 2.region 3.region 4.region

. ivregress 2sls rent c.pcturban (c.hsngval c.hsngval#c.pcturban = faminc i.region)

Instrumental variables (2SLS) regression          Number of obs   =         50
                                                  Wald chi2(3)    =      95.82
                                                  Prob > chi2     =     0.0000
                                                  R-squared       =     0.5886
                                                  Root MSE        =     22.448

--------------------------------------------------------------------------------------
                rent |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------------+----------------------------------------------------------------
             hsngval |    .012628   .0038516     3.28   0.001     .0050791    .0201769
                     |
c.hsngval#c.pcturban |  -.0001453   .0000537    -2.71   0.007    -.0002505   -.0000401
                     |
            pcturban |   7.037653   2.587203     2.72   0.007     1.966828    12.10848
               _cons |  -358.7519    177.772    -2.02   0.044    -707.1785   -10.32518
--------------------------------------------------------------------------------------
Instrumented:  hsngval c.hsngval#c.pcturban
Instruments:   pcturban faminc 2.region 3.region 4.region

. margins, dydx(hsngval)

Average marginal effects                        Number of obs     =         50
Model VCE    : Unadjusted

Expression   : Linear prediction, predict()
dy/dx w.r.t. : hsngval

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     hsngval |   .0028993   .0004123     7.03   0.000     .0020912    .0037074
------------------------------------------------------------------------------




. regress rent c.pcturban c.hsngval

      Source |       SS           df       MS      Number of obs   =        50
-------------+----------------------------------   F(2, 47)        =     47.54
       Model |  40983.5269         2  20491.7635   Prob > F        =    0.0000
    Residual |  20259.5931        47  431.055172   R-squared       =    0.6692
-------------+----------------------------------   Adj R-squared   =    0.6551
       Total |    61243.12        49  1249.85959   Root MSE        =    20.762

------------------------------------------------------------------------------
        rent |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    pcturban |   .5248216   .2490782     2.11   0.040     .0237408    1.025902
     hsngval |   .0015205   .0002276     6.68   0.000     .0010627    .0019784
       _cons |   125.9033   14.18537     8.88   0.000     97.36603    154.4406
------------------------------------------------------------------------------

. regress rent c.pcturban##c.hsngval

      Source |       SS           df       MS      Number of obs   =        50
-------------+----------------------------------   F(3, 46)        =     53.26
       Model |  47553.1926         3  15851.0642   Prob > F        =    0.0000
    Residual |  13689.9274        46  297.607117   R-squared       =    0.7765
-------------+----------------------------------   Adj R-squared   =    0.7619
       Total |    61243.12        49  1249.85959   Root MSE        =    17.251

--------------------------------------------------------------------------------------
                rent |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
---------------------+----------------------------------------------------------------
            pcturban |   3.359486   .6378362     5.27   0.000     2.075588    4.643383
             hsngval |   .0068502     .00115     5.96   0.000     .0045353     .009165
                     |
c.pcturban#c.hsngval |  -.0000666   .0000142    -4.70   0.000    -.0000951    -.000038
                     |
               _cons |  -97.85703    49.0617    -1.99   0.052    -196.6131    .8990436
--------------------------------------------------------------------------------------

. margins, dydx(hsngval)

Average marginal effects                        Number of obs     =         50
Model VCE    : OLS

Expression   : Linear prediction, predict()
dy/dx w.r.t. : hsngval

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     hsngval |   .0023936   .0002651     9.03   0.000     .0018599    .0029272
------------------------------------------------------------------------------

Note how in the IV spec with interaction, the coefficient on housing value is over 5.5 times larger than in the simple IV spec. The marginal effect (averaging over percent urban), however, is pretty similar.

Finally, if you only have one instrument you probably want something like this:

ivregress 2sls rent c.pcturban (c.hsngval c.hsngval#c.pcturban = c.faminc c.faminc#c.pcturban)
margins, dydx(hsngval)

A quadratic endogenous variable would be:

ivregress 2sls rent c.pcturban (c.hsngval##c.hsngval = c.faminc##c.faminc)
margins, dydx(hsngval)

The example above did not work out as nicely with these, so I used two instruments.

Related Question