Solved – Validity of pseudo-panel data constructed from repeated cross sectional data as a panel data

cross-sectionpanel datavalidation

I am looking at the repeated cross-sectional data from federal reserves, which has both panel data and repeated cross sectional data at different time-points,e.g. 2007-2009 is a panel while 2010 is a cross sectional data set and everything before that is repeated cross section as well until you get back to the 1983-1989 period which is also a panel. I want to use recent data-sets like 2001 – 2009, of which only the last two years will be true panel data.

RCS data is considered to be inferior to true panel data in general in the sense that in the former case, the same individuals are not followed over time, thus making individual histories unobtainable to include in a model. However, Several authors such as Deaton (1985), Moffitt (1990,1993) showed that the RCS data can be used to estimate a few commonly used models such as the fixed effects model or the linear dynamic model. These methods are based on grouping “similar” individuals in cohorts and the ‘cohort-averages’ are treated as observations from a pseudo-panel. Note that, all the prior studies were conducted on repeated cross-sections without the panel part.

Now, my first question is, 'Is there any known method to compare pseudo-panel data to a panel data?'. My idea is to fit a model to the synthetic panel data and estimate the parameters, then fit the same model to genuine panel data and compare the estimation accuracy. Does this sound correct? Of course, I want ideas about how much of it is doable. (Please note that I have limited ideas about how to manage a huge data-set like the ones available in fedres website.)

Best Answer

I do not know whether there are established methods to compare panel data to repeated cross-sectional data. But I want to add that true panel data is not always superior to repeated cross-sectional data in general. Attrition or learning effects for example may be a problem in panel data but not in repeated cross-sectional data although I do not know whether these problems are present in your case. But if this is the case, the second and third years (and so on) of your panel data may be problematic compared to repeated cross-sectional data in some sense. You should keep this in mind.

In general I think what you want to do sounds doable and it could reveal new information in comparison with the analysis of cross-sectional data only (although I do not know your research question).

If the estimations differ between both analyses I would have a look whether what could be the reasons by looking at the advantages and disadvantes of both types of datasets. There are several papers about the this topic which might help you such as

Deaton (1985)

Verbeek & Nijman (1992)

Frees (2004)

Lee & Niemeier (1996)

Hsiao (2007)