Solved – Proportion dependent variable (0-1 range) in an unbalanced, non-dynamic panel

methodologypanel dataproportion;

I am a PhD student currently performing data analysis for my publication and have the following problem:

Dataset:
unbalanced data of around 80 firms over 20 year period (due to firms entering and leaving the industry, it is impossible to create a balanced panel, as it would be too small)

Dependent variable:
A diversity measure (range 0-1). 0 and 1 are technically possible values but do not appear among the around 500 observations in my data (and I think it is unlikely that firms would have exactly 0 or 1 diversity)

Independent variables:
Various continuous and dummy variables

My understanding of the existing options (might be wrong though)
I understand that for non-panel data, there are a variety of options, e.g. beta regression, however it seems they can't be used for panels. One reviewer of a previous version of my paper pointed me to Papke and Wooldridge (2008), but as I understand it, this only works for balanced panels. I found the publication of Elsas and Florysiak (2013). Their approach with the DPF estimator seems quite understandable and they also explain how to implement it in Stata. However, it seems their approach, as many approaches for financial models, is for dynamic panel datasets, i.e. it includes the dependent variable's value of the preceding time period as a predictor.

My questions:

Is there a method that I have overlooked(ideally easy to implement in Stata) for my problem?
Does the fact that I could live with not modelling 0 and 1 help? Is there maybe a simpler solution, such as an application of beta regression on unbalanced panels?
In case I use the DPF estimator methodology suggested by Elsas & Florysiak, can I drop the y(t-1) term and use it as a non-dynamic panel? Or do I need to use this term even I wold normally not include it in my model? (I would prefer not dropping any more data points due to the need for lagged variables)

Thank you!

References:
Papke, L. E., & Wooldridge, J. M. (2008). Panel data methods for fractional response variables with an application to test pass rates. Journal of Econometrics, 145(1), 121-133.
http://people.stern.nyu.edu/wgreene/Econometrics/Papke-Wooldridge-FractionalResponse.pdf

Elsas, R., & Florysiak, D. (2013). Dynamic capital structure adjustment and the impact of fractional dependent variables. Journal of Financial and Quantitative Analysis (JFQA), Forthcoming.
Chicago
https://www.researchgate.net/profile/Ralf_Elsas/publication/228252749_Dynamic_Capital_Structure_Adjustment_and_the_Impact_of_Fractional_Dependent_Variables/links/5477a71b0cf205d1687c57e0.pdf

Best Answer

It's not very clear what it is you want to implement. But I think you want a dynamic panel model that includes an exogenous variable $x \in (0,1)$. I think your $x$ is a continuous variable (as opposed to a binary variable), is that correct?

If that's correct, then you're contemplating a fixed effects model like, $$ y_{it} = \alpha y_{i(t-1)} + \beta x_{it} + \eta_i + \epsilon_{it} $$ where $\eta_i$ is the time-invariant fixed effect and $\epsilon_{it}$ is the i.i.d. error.

If I've understood you correctly then, in answer to your specific questions:

1) Yes, dynamic panels can be easily implemented in Stata with Roodman's xtabond2. The Stata Journal paper explaining it is here.

2) A continuous variable bounded between 0 and 1 is no problem at all, in fact its quite common (e.g., employment rate data)

3) Of course you can drop the $y_{i(t-1)}$ term and just fit a non-dynamic panel. But whether you should or not depends on a number of things including what correlative relationship are you interesting in analyzing, what (if any) theoretical model are you operationalizing, and whether your data has a dynamic / autoregressive structure in it (which requires pretesting and diagnostic testing which you should always been doing anyway).

A short and sweet overview is here.

Best Answer

Related Solutions

Econometrics – Advantage of Balanced Panel Data Vs. Unbalanced

Solved – Unbalanced data or Balanced data