Solved – Inclusion of lagged dependent variable in regression

lagsmisspecificationregression

I'm very confused about if it's legitimate to include a lagged dependent variable into a regression model. Basically I think if this model focuses on the relationship between the change in Y and other independent variables, then adding a lagged dependent variable in the right hand side can guarantee that the coefficient before other IVs are independent of the previous value of Y.

Some say that the inclusion of LDV will biase downward the coefficient of other IVs. Some others say that one can include LDV which can reduce the serial correlation.

I know this question is pretty general in terms of which kind of regression. But my statistical knowledge is limited and I really have a hard time figuring out if I should include a lagged dependent variable into a regression model when the focus is the change of Y over time.

Are there other approaches to deal with the influence of Xs on the change of Y over time? I tried different change scores as DV as well, but the R squared in that situation is very low.

Best Answer

The decision to include a lagged dependent variable in your model is really a theoretical question. It makes sense to include a lagged DV if you expect that the current level of the DV is heavily determined by its past level. In that case, not including the lagged DV will lead to omitted variable bias and your results might be unreliable. In such a scenario, including the lagged DV, will take out a lot of your variance and is likely to make your other DV's effects less significant (which means both make the $\beta$s smaller and the standard errors bigger). However, what it will allow you to do is say that those IVs that still influence your outcome have an effect controlling for past value of the DV. An alternative approach to this is to use the difference between your outcome variable at period $t$ and $t-1$ as your DV for period $t$.

However, doing any of these imply answering an important question: what is the right lag structure for your DV? You can get some information about this by observing the correlation between your outcome variable with itself for different lag values (e.g. correlation between Y and Y$t-1$, Y and Y$t-2$, etc.).

Related Question