Solved – Suggestions for running panel data with small sample size

panel datasmall-sample

Okay,say I have a dataset which contains data on unemployment rates, wages, oil prices faced by a country, incidence of civil conflict for about 30 countries opver a period of 10 years. So this is like a panel data with small number of countries over a small number of time periods. I am say interested in running a regression of voter turnout in national elections in each year on all the variables I outlined above. A very simple regression i can run is

$Y_{it} = \alpha_{i} + \delta_{t} + \beta X_{it} + \varepsilon_{it}$

Where $\alpha_{i}$ is country fixed effect. $\delta_{t}$ is year fixed effect, and $X_{it}$ are the relevant independent variables. Now I am wondering whether this regression is valid for such a samll sample of countries especially using these fixed effects since I may lose a lot of precision there. Are there any suggestions on how I could explore the panel nature of this data without losing too much precision.

Best Answer

As described, this panel data model regression is valid theoretically. The single biggest advantage of panel data is that it "pools" information, thereby shrinking the error. Of course, with more information the errors would be even smaller. With 30 cross sections and ten years of annual information, it sounds like a balanced matrix. I wouldn't even call that "small." It's just not enough information to initialize the more traditional, univariate approaches to time series modelling such as Box-Jenkins, ARIMA, ARCH, etc.

Are there any other complications? Missing values requiring imputation? Mixed frequencies where the predictors and dependent variable are in differing units of time, e.g., annual vs quarterly? If the latter, then Ghysel's MIDAS (MIxed DAta Sampling) approach might be helpful. Ghysels has many papers about this.

You probably want to explore lead-lag relationships which are agnostic wrt causal flow, i.e., relaxing the typical, directional causal assumptions. Some useful work has been done on this by Sornette in his papers on TOPS (thermal optimal paths). His papers are technically sophisticated but the core ideas are not. You can readily develop "brute force" workarounds for them.

Another consideration would be to explore Pesaran's CD test for weak cross-sectional dependence. His test realistically assumes at least some level of dependence between cross-sections and is, therefore, less stringent than others.

Multivariate tests for unit roots, autocorrelation, and so on, are in an early stage of development wrt panel models. The only references I'm aware of that explicitly address them are from SAS as related to its PROC PANEL module. Here are some links:

http://go.documentation.sas.com/?docsetId=etsug&docsetVersion=14.2&docsetTarget=etsug_panel_details50.htm&locale=en

http://go.documentation.sas.com/?docsetId=etsug&docsetVersion=14.2&docsetTarget=etsug_panel_details68.htm&locale=en

These are probably implementable in other software but it would be incumbent on you to do that.

Beyond that, I don't have too many more suggestions. The balanced nature of your data actually simplifies the challenges a great deal. Consider yourself lucky as it could be a lot worse.