I would like to write you beacause of the following issue: I´m estimating an IV-model with the following common structure: $Y = constant + b1*X1 + b2*X2 + b3*Xend + b..*Xcontrols$. I´ve found also a promising instrumental variable for $Xend$, $Xinstr$. In order to check overall robustness I used the original OLS and OLS vce robust specification and several 2SLS estimators. In general and beside some minor changes in coefficients and significance levels (probably due to the adequacy of IV-Regression) the theoretically hypothesized effects keep in place.
But as soon, as I modify my model to an interaction model:
$Y = constant + b1*X1 + b2*X2 + b3*Xend + b4*(X1*Xend) + b5*(X2*Xend) + b..*controlsX$
some really odd things happen: There is a very notable and thus confusing structural change in the values of coefficients and further significance related statistics between the classical OLS estimators and the several 2SLS estimators. In detail, every prior (in OLS) significant realtionship cancels out (e.g $b1$ $b2$ $b3$ and $b4$) and the coefficients even change signs.
As literature suggested in my first stage equation I´ve used the variable (($Xinstr * X1$) and ($Xinstr * X2$)) as an instrument itself for the newly added endogenous interaction terms (in stata notation e.g. ivregress Y (Xend (Xend*X1) (Xend*X2) = Xinstr. (Xinstr. * X1) (Xinstr. * X2)) X1 X2 Xcontrols)
.
What is going on here? Why is this change happening?
Here are some actual quick and dirty examples of my work on car sales and marketing strategies (please forgive me the formatting issues; i also shortened the actual output and the variations in estimators in the interest of time).
As you can see in the original regressions (non-interaction) there is no big difference….but in the interaction model the obtained effects via OLS cancel out (especially for the two strategy related variables of main interest).
quietly regress lnsales car_quality marketing_strategy1 marketing_strategy2
sourcing car_type1 car_type2 (+"List of additional control variables")
estimates store OLS
quietly regress lnsales car_quality marketing_strategy1 marketing_strategy2
sourcing car_type1 car_type2 (+"List of additional control variables"), robust
estimates store OLS_robust
global ivmodel lnsales (car_quality = peer_quality) marketing_strategy1 marketing_strategy2
sourcing car_type1 car_type2 (+"List of additional control variables")
quietly ivregress 2sls $ivmodel
estimates store TwoSLS_def
quietly ivregress 2sls $ivmodel , vce(robust)
estimates store TwoSLS__2
quietly ivregress gmm $ivmodel , wmatrix(robust)
estimates store GMM_het
quietly ivregress gmm $ivmodel , wmatrix(robust) igmm
estimates store IGMM
quietly ivregress liml $ivmodel , vce(robust)
estimates store LIML
estimates table OLS OLS_robust TwoSLS_def TwoSLS__2 GMM_het IGMM LIML,
b se p stats(N r2)
------------------------------------------------------------------------------
Variable | OLS OLS_robust TwoSLS_def TwoSLS__2 GMM_het
-------------+----------------------------------------------------------------
car_~y | .44455351 .44455351 .44888526 .44888526 .44888526
| .05834619 .07762703 .12372644 .10091798 .10091798
| 0.0000 0.0000 0.0003 0.0000 0.0000
marketing_~1 | -.02134571 -.02134571 -.02261369 -.02261369 -.02261369
| .14387381 .13990431 .13956152 .13548022 .13548022
| 0.8822 0.8789 0.8713 0.8674 0.8674
marketing_~2 | -.34940482 -.34940482 -.3491414 -.3491414 -.3491414
| .15259582 .13431119 .14412673 .1269109 .1269109
| 0.0229 0.0099 0.0154 0.0059 0.0059
sourcing | .00599138 .00599138 .00603506 .00603506 .00603506
| .15266332 .14239443 .14403715 .13414465 .13414465
| 0.9687 0.9665 0.9666 0.9641 0.9641
car_~1 | -.30344565 -.30344565 -.30478088 -.30478088 -.30478088
| .27143962 .26951864 .25836192 .26001529 .26001529
| 0.2647 0.2613 0.2381 0.2411 0.2411
car_~2 | -.02749295 -.02749295 -.03170655 -.03170655 -.03170655
| .34545754 .39088556 .34328748 .36963657 .36963657
..........
..........
..........
Now the model with interactions…. please note the shifts from OLS to 2sls in the quality and strategy variables
quietly regress lnsales car_quality marketing_strategy1 marketing_strategy2 sourcing car_type1 car_type2 (+"List of additional control variables")
estimates store OLS
quietly regress lnsales product_quality marketing_strategy1 marketing_strategy2 sourcing car_type1 car_type2 (+"List of additional control variables"), robust
estimates store OLS_robust
global ivmodel lnsales (c.car_quality c.car_quality#i.marketing_strategy1 c.car_quality#i.marketing_strategy2= c.peer_quality i.marketing_strategy1#c.peer_quality i.marketing_strategy2#c.peer_quality) marketing_strategy1 marketing_strategy2 sourcing car_type1 car_type2(+"List of additional control variables")
quietly ivregress 2sls $ivmodel
estimates store TwoSLS_def
quietly ivregress 2sls $ivmodel , vce(robust)
estimates store TwoSLS__2
quietly ivregress gmm $ivmodel , wmatrix(robust)
estimates store GMM_het
estimates table OLS OLS_robust TwoSLS_def TwoSLS__2 GMM_het IGMM LIML, b se p stats(N r2)
------------------------------------------------------------------------------
Variable | OLS OLS_robust TwoSLS_def TwoSLS__2 GMM_het
-------------+----------------------------------------------------------------
car_~y | .30626371 .30626371 .40466472 .40466472 .40466472
| .06639855 .08737882 .17734552 .14822445 .14822445
| 0.0000 0.0005 0.0225 0.0063 0.0063
|
marketing_~1 | -2.7663962 -2.7663962 -1.022544 -1.022544 -1.022544
| .87427115 .87740022 3.468728 3.021177 3.021177
| 0.0018 0.0018 0.7682 0.7350 0.7350
|
marketing_~1#|
c.car~y |
1 | .40964628 .40964628 .14894708 .14894708 .14894708
| .12788375 .12954421 .51333938 .44914179 .44914179
| 0.0015 0.0018 0.7717 0.7402 0.7402
marketing_~2 | -1.6974189 -1.6974189 -.81075049 -.81075049 -.81075047
| 1.2256574 1.0156041 4.4093988 3.5747531 3.5747531
| 0.1674 0.0960 0.8541 0.8206 0.8206
|
marketing_~2#|
c.car~y |
1 | .20617457 .20617457 .07077817 .07077817 .07077817
| .18004716 .14488011 .65063831 .53051219 .53051219
| 0.2533 0.1560 0.9134 0.8939 0.8939
|
sourcing | .02814061 .02814061 .01454754 .01454754 .01454754
| .15052717 .13857819 .17351094 .14787563 .14787563
| 0.8519 0.8393 0.9332 0.9216 0.9216
car_~1 | -.23592028 -.23592028 -.28205832 -.28205832 -.28205832
| .26452637 .23238727 .26379489 .24610577 .24610577
| 0.3734 0.3110 0.2850 0.2518 0.2518
car_~2 | -.02415081 -.02415081 -.03596115 -.03596115 -.03596115
| .33585648 .37759328 .33613488 .36136989 .36136989
| 0.9427 0.9491 0.9148 0.9207 0.9207
.............
.............
.............
Best Answer
There could all sorts of things going on, but without knowing more about the details of your model and actual commands and results, it will be hard to say more. Don't show us pseudo-code with generic y and x. No one but you can decipher what
Xinstr. (Xinstr. * X1)
means. At the very least, show us the actual Stata commands you typed. Also, from the parentheses arrangement in your question, it seems like you share the common misunderstanding that instruments map onto the endogenous variables one to one. That's not how IV works.Having said that, the first thing I would try is to make sure that you're comparing apples to apples. In the simple model, the IV and OLS coefficients on $X_{end}$ are the marginal effects. In the interactions model, the marginal effects are more complicated and non-linear, so you need to take that into account when comparing. You can't just look at the coefficients.
Here's an example:
Note how in the IV spec with interaction, the coefficient on housing value is over 5.5 times larger than in the simple IV spec. The marginal effect (averaging over percent urban), however, is pretty similar.
Finally, if you only have one instrument you probably want something like this:
A quadratic endogenous variable would be:
The example above did not work out as nicely with these, so I used two instruments.