Solved – Practical issues with dynamic panel data modeling

Unfortunately for me, I've got a situation where I need to control for the lag of a dependent variable as a robustness check against an alternative interpretation of my main regression. The baseline specification is
$$
y_{it} = \alpha_i + \delta y_{i,t-1} + X'\beta + \epsilon_{it}
$$

The lagged dependent variable renders OLS (via the within (de-meaning) transformation) inconsistent, so it seems that the standard approach is the Arellano-Bond estimator, described in Chapter 8 of [this book].1 Basically an instrument matrix is constructed from all available lags of the DV starting with the second lag (i.e. there are more instruments in later than earlier observations), and applied as a 2-step GMM estimator to the first-differenced model of interest. I'm still digesting these methods, and it seems clear that this is an active area of econometrics research.

I've got a few practical questions. Many of them might not have answers yet.

What do you do when the AR2 test fails? By construction, these models will have AR1 serial correlation. But they shouldn't exibit AR2. What if they do? Can I include more than one lag of the DV? How then should I construct the instrument matrix?
When one has a huge dataset, will the AR2 test be too-powerful — detecting small degrees of correlation with no practical significance?
Is it possible to include more than one variable among the instruments? Say I'm interested in an interaction of $y_{t-1,i}$ with some variable $C$. This will be endogenous and require an instrument — probably of the same structure as the instrument for the lagged DV itself — but it isn't clear to me how I'd generalize the AB instrument matrix to incorporate this.
Finally, what are some readable books and papers on these methods? I get the sense that this is a fresh enough research area that little effort is yet expended on cleaning up the research for the purpose of exposition.

Best Answer

Arellano-Bond is really about using moment conditions where differenced variables are instrumented with all available lags (the most important part of the paper is just equation (2) which lays this out). Even the "all available" part is for efficiency - the estimator would be valid if you just did the lazy thing and used only two lags to instrument for every period. In this sense the paper's really quite general. Suppose we have: $$ y_{it}=\sum_{l=1}^p\rho_ly_{i,t-l}+x_{i,t}'\beta+\alpha_i+\epsilon_{it} $$ where $\alpha_i$ is an individual effect, and $\epsilon_{it}$ is uncorrelated with all $y_{i,s},\,\,s=t-1,\dots,0$ and all $x_{i,s},\,\,s=t,\dots,0$. Then first differencing removes the fixed effect: $$ y_{it}-y_{i,t-1}=\sum_{l=1}^p\rho_l(y_{i,t-l}-y_{i,t-l-1})+(x_{i,t}-x_{i,t-1})'\beta+\epsilon_{it}-\epsilon_{it-1} $$ and all $y$'s older than $t-2$ and $x$'s older than $t-1$ are available instruments. If $x$ is endogenous, $z$ can be used as an instrument in it's place so long as it is uncorrelated with $\epsilon_{it}-\epsilon_{it-1}$. This allows for arbitrary lags and $x$'s. EDIT: Note that the number of lags you can use as valid instruments are determined by the error term $\epsilon_{it}-\epsilon_{i,t-1}$. Since $\epsilon_{i,t-1}$ is correlated with $y_{i,t-1}$, you cannot use $y_{i,t-1}$. However, you can use $y_{i,t-2}$ regarless of the lag length $p$, because it will always be uncorrelated with $\epsilon_{i,t-1}$

Again, difference to remove the fixed effect and lag in order to take moment conditions. The rest is all matrix algebra and careful stacking of vectors in order to work those moment conditions out. For this I would suggest following the directions in one of the links below.

So with that in mind:

It will work with arbitrary lags.
No, it would be the opposite - you should worry about the finite sample properties, not the properties in a large sample. Though I will admit to knowing less about this than other parts of your question.
Arellano-Bond allows for multiple instruments. The original paper is somewhat dense but it will tell you how to do this. If you are using R or Stata, the canned routines can incorporate arbitrary instruments. If you look at page 290, footnote vi in their original Review of Economic Studies paper you can see how the instrument matrix would be laid out. The source I have below shows the same thing.
They're not all that fresh - by econometrics standards I might almost call them old. If you want a nice presentation, try Behr (2003) (alternative link).

EDIT: If you're worried about weak instruments you might want to try the estimators presented in Blundell and Bond (1998) or Arellano and Bover (1995). These feature a larger set of moments including using lagged differences of $y$ as instruments for the current level. The papers were written specifically because of the concern that the original instruments might be too weak.