Solved – 2-stage panel model – am I doing it right

fixed-effects-modelinstrumental-variablespanel data

I ran a 2-stage fixed-effects panel model in R. The goal is to find the effect of strategic alliance participation on firm performance. Alliance participation is not random – firms self-select (and are selected by their future partners) into alliances. Thus I ran a 2-stage model.

First, I ran a fixed-effects plm model in which I regressed log(number of alliances) on a set of variables that should impact the propensity of firms to participate in alliances (I have 11 years worth of alliances for about 500 firms).

Second, I plugged the results of the first stage into the second fixed-effects panel where I estimate firm performance as a function of log(number of alliances) and other variables.

Now for the question. Should I plug in the fitted values of log(number of alliances) or the actual values of log(number of alliances) from the data set? I have read that 2-stage models call for the fitted values in place of the actual values. Is that correct?

I understand that I also need to plug in the residuals obtained in the first stage as the correction for self-selection. Is that correct? How should I interpret the coefficients for the value itself (fitted or actual) and for the residual?

I tend to interpret them this way: the coefficient for the fitted value is the effect of alliance participation on firm performance that is expected for the average firm. The coefficient for the residual is the effect of deviation from the predicted value. Please let me know if this sounds like a correct way to interpret my results.

Best Answer

This sounds a bit too complicated and I'm not entirely convinced that this is the correct approach. If you have a model of the type

$$y_{1it} = X'_{it}\beta + \gamma y_{2it} + \mu_i + \nu_{it}$$

where $y_{1it}$ is your outcome, $y_{2it}$ is your endogenous variable that is correlated with the error $\nu_{it}$, and $X$ is a vector of exogenous variables including a constant that are potentially correlated with the fixed effects $\mu_i$ but not with $\nu_{it}$. Suppose you have a valid instrument $z_{it}$, and let $n$ denote the number of panels (the number of firms in this case), $T_i$ is the number of observations for one firm, and $N$ is the total number of observations in the entire data, $\sum^n_{i=1}T_i$. All of these quantities will be readily available from your data.

The typical procedure would be to do the within transformation for your variables. For a generic variable $m$ the within transformation for the fixed effects IV case would be $$\tilde{m} = m_{it} - \left(\frac{1}{n}\sum^{T_i}_{t=1}m_{it}\right) + \left(\frac{1}{N}\sum^{n}_{i=1}\sum^{T_i}_{t=1}m_{it}\right)$$

This within transformation eliminates all time-invariant factors including the unobserved fixed effects $\mu_i$. All you need to know for this is how to do group-wise summations in R (unfortunately I'm not an R guy, otherwise I would have provided you with the code but I guess you can easily find this out from the manuals). The within transformed model then is

$$\tilde{y}_{1it} = \tilde{X}'_{it}\beta + \gamma \tilde{y}_{2it} + \tilde{\nu}_{it}$$

The fixed effects 2SLS estimator can then be obtain from running the 2SLS regression of $\tilde{y}_{1it}$ on $\tilde{X}$ and $\tilde{y}_{2it}$, instrumenting $\tilde{y}_{2it}$ with $\tilde{z}_{it}$. To do this manually, regress

$$\tilde{y}_{2it} = \tilde{X}'_{it}\phi + \pi \tilde{z}_{it} + \tilde{\eta}_{it}$$

obtain the fitted values for $\tilde{y}_{2it}$ from this first stage (which I will denote as $\widehat{\tilde{y}}_{2it}$), and then regress

$$\tilde{y}_{1it} = \tilde{X}'_{it}\beta + \gamma \widehat{\tilde{y}}_{2it} + \tilde{\nu}_{it}$$

Note that the standard errors of this approach will not be the correct ones given that $\widehat{\tilde{y}}_{2it}$ is an estimated quantity. One way of correcting for this is to bootstrap both regressions. For instance, in Stata I would write a program that performs the 2SLS estimation and then bootstrap the standard errors as follows:

program fe2sls
   * first stage
   reg y2_tilde x_tilde z_tilde
   predict y2_hat_tilde, xb

   *second stage
   reg y1_tilde x_tilde y2_hat_tilde
   drop y2_hat_tilde
end

* obtain bootstrapped standard errors
bootstrap, reps(200): fe2sls

Otherwise you can obtain the variance-covariance matrix $\text{VCE}_{2sls}$ and inflate it by the factor $$\frac{N-Z}{N - n - K + 1}\cdot \text{VCE}_{2sls}$$ where $Z$ is the number of variables in $z$ (for one instrument $Z = 1$). If you do not correct the standard errors in this way they tend to be under-estimated and you might find significant effects even though you probably shouldn't.

Disclaimer: this answer heavily borrowed from the Stata documentation for the panel data 2SLS estimator which can be obtain using the xtivreg command. For more information on the topic and additional references see the corresponding manual.

Related Question