Solved – Choice between static and dynamic panel regression

dynamic-regressionfixed-effects-modelgeneralized-momentspanel datarandom-effects-model

I have a panel dataset with countries as individuals observed per year. My analysis concerns a macroeconomic study and as often happens in these cases (I would not be wrong but they are commonly called "macro-panel" or "wide-panel"). I have few temporal observations per country, about 15-24 years, and there is evidence of individual heterogeneity, i.e. with an individual unobserved effects.

Using static panel models, the literature suggests using fixed-effect models in such cases, but in the models I am analyzing the Hausman test tells me the opposite, preferring the random-effect models. I don't understand why I get this result. I'm starting to think that it is due to the presence of variables that change little or no at all over time, and this generates problems in fixed models. However, I consider these regressors important for my analysis and therefore I would like to know if it is coherent to choose the random effects and how I can justify this choice.

Also my models suffer from both cross-sectional dependence and serial correlation. I don't know how much of a problem this can be, the only way I'm dealing with them is by using robust standard errors. On the other hand, I have found research regarding my work and most of them are based on dynamic models, using generalized method of moments (GMM) with a lagged dependent variable as a regressor. Unfortunately, I don't know well the theory on which GMMs are based, but I have learned of their versatility, especially in situations like these, which has made them increasingly used in recent years.

Therefore, given the diagnostic problems I am having with static models, I wonder if it is more appropriate to use dynamic models, or to stay on fixed / random models taking the necessary precautions, and on the basis of what I can evaluate this choice, comparing the results obtained with the two different approaches (static vs dynamic).

I hope I was clear.

Best Answer

My analysis concerns a macroeconomic study and as often happens in these cases (I would not be wrong but they are commonly called "macro-panel" or "wide-panel").

I have never heard of the term "wide panel" before. I assume you are referring to a dataset with some fixed $N$ and small $T$, thus giving the appearance of a wider data frame. I would argue "macro panel" is a more apt description, where you have a reasonably large number of countries over many years (e.g., 20 years or more). The term "macro panel" is used quite frequently in Chapter 12 of Badi Baltagi's Econometric Analysis and Panel Data.

Using static panel models, the literature suggests using fixed-effect models in such cases, but in the models I am analyzing the Hausman test tells me the opposite, preferring the random-effect models. I don't understand why I get this result, I'm starting to think that it is due to the presence of variables that change little or no at all over time, and this generates problems in fixed models.

Results from a Hausman test suggest random effects. In short, your unique errors are not correlated with your regressors. You might favor a random effects estimator if some of these time-invariant or "slow moving" regressors are of substantive interest. In a fixed effects model, all time-constant variables included in your model will be collinear with the country-specific effect and summarily dropped; this is because a fixed effects estimator only uses the time-series variation from within each country. A random effects estimator, on the other hand, will exploit some "between unit" (i.e., cross-country) variation, and thus any time-constant variables may remain. In fact, the random effects estimator is akin to a weighted average of the "within" and "between" estimators.

You also state that the presence of omitted, time-constant confounders "generates problems in fixed models." I disagree with this statement. The attractiveness of fixed effects estimators is they will 'partial out' the effects of all time-constant variables—even those you have not explicitly measured (or even thought of). If you proceed with a random effects model, then your country-specific effect is treated as random and is assumed to be uncorrelated with your explanatory variables. Is this a reasonable assumption in your setting? In practice, this assumption is often incorrect. It is unlikely that the true correlation between the unit (i.e., country) effects and your covariates is exactly zero. Quoting from Clark & Linzer (2015):

[I]f the Hausman test fails to reject the null hypothesis of orthogonality, it is most likely not because the true correlation is zero....Rather, it is likely that the test has insufficient statistical power to reliably distinguish a small correlation from zero correlation....Of course, in many cases, a biased (random-effects) estimator can be preferable to an unbiased (fixed-effects) estimator if the former provides sufficient variance reduction over the latter. The Hausman test does not help evaluate this trade-off."

I should also note that the advice you receive from others might be discipline-specific. I once conferred with an epidemiologist on a project and he was more than happy to disregard the results from a Hausman test. Applied econometricians, on the other hand, might be more predisposed to let the results from a Hausman test guide their approach. I'm expert in neither field of study, so I shouldn't speak for entire disciplines. Maybe one of them will jump in and set me straight on that one.

It is difficult to offer guidance without more detail about the theoretical model under consideration. In the comments I drew your attention to the notion of generalizability outside of your sample of countries. It might be more appropriate to treat the unit-specific effect as "random" if you're sampling a subset of units from a larger, unobserved population. The units can be individuals, hospitals, precincts, counties, et cetera. Sometimes we are interested in the units outside of our sample, and other times we only care about the sample at hand. I think this is important to consider in your situation. Suppose you wish to survey the attitudes, beliefs, and perceptions of college students about some important social issue over time across a diverse range of campuses in the United States. Scarce funding for the project limits your ability to obtain repeated measures across time at all universities, so you decide to observe a subset of campuses instead. If your aim is to generalized to all campuses in the United States (or any broader campus population outside of your sample), then treating the "campus" as random might be preferred. Or, maybe you already sampled all observations from the relevant population. There are many examples of this in the real world.

For example, suppose I was hired to evaluate a new policy implemented by a large metropolitan police department. A subset of law enforcement districts implement the policy and others do not. Now suppose police officials want to know if the new policy/directive was effective at lowering the crime rate. To begin, I might survey all districts that comprise that agency before and after the policy exposure period. In this case, I'm sampling the entire population of law enforcement units, which include all treated and untreated districts. Again, I only care about the districts comprising one large metropolitan agency. If I only wish to make statements about that particular agency, then I might proceed with a fixed effects estimator, or some derivative of it. Now suppose I sample a much broader array of police districts in a particular geographic region of the United States and I want to make statement about all districts in the entire country. In this setting, it might want to treat the district as a random effect.

In your setting, it appears you're only interested in European countries. I don't presume your results will guide policy or be applicable to Asian markets—or maybe they will. As I see it, you're only sampling a subset of the 44 European countries (though to be precise, it might be 49 countries if you're considering the Eurasian Caucasus region as part of the European continent). As you already noted, your results shouldn't change much if you obtained the full population of European countries. If you're only interested in the subset of European markets, then maybe modeling the country effect as 'fixed' is the better way to go.

Therefore, given the diagnostic problems I am having with static models, I wonder if it is more appropriate to use dynamic models, or to stay on fixed / random models taking the necessary precautions, and on the basis of what I can evaluate this choice, comparing the results obtained with the two different approaches (static vs dynamic).

What diagnostic problems did you run into? Just because the results of a Hausman test suggest random effects doesn't mean you have a problem. Running a "static" random effects model is perfectly acceptable in my estimation.

It is also reasonable for you to explore a dynamic model. A very good predictor of behavior or economic activity at time $t$ is the period $t-1$. But a dynamic model introduces a new set of problems. To be clear, by "dynamic" I mean including a lagged dependent variable(s) on the right-hand side of your equation. I would caution you against modeling "country" as either fixed or random while also including a lagged dependent variable as a predictor. The lagged version of your outcome will be correlated with the random effect that is part of your error term.

I also encourage you to read Paul Allison's blog in regard to the problems associated with using a lagged outcome as a predictor in panel data models. The discussion is quite interesting. In sum, there is no correct answer to your question. I wouldn't stray too far from other applied work addressing your specific research question. Here is a recent paper by Leszczensky & Wolbring 2019 that offers some alternative approaches to addressing reverse causality in panel data contexts.

I have found research regarding my work and most of them are based on dynamic models, using generalized method of moments (GMM) with a lagged dependent variable as a regressor.

You didn't cite any empirical work in your question but there is strong literature out there which shows that the generalized method of moments (GMM) estimator can generalize to panel data with spatially and temporally correlated error components. I'm not sure what software you're working with, but you could run a dynamic panel model using GMM in R fairly easily. The GMM estimator is provided by the pgmm() function from package plm in R. It’s main argument is a dynformula which describes the variables of the model and the lag structure (see here for more information). I believe this function mirrors xtabond2 from Stata. I can't speak for all software packages but I believe you could implement a model fairly easily based upon the associated documentation.

I hope this helps!