Stata – Propensity Score Matching with Panel Data: Methods and Applications

panel datapropensity-scoresstata

I have a longitudinal data set of individuals and some of them were subject to a treatment and others were not. All individuals are in the sample from birth until age 18 and the treatment happens at some age in between that range. The age of the treatment may differ across cases. Using propensity score matching I would like to match treated and control units in pairs with exact matching on the year of birth such that I can track each pair from their birthyear until age 18. All in all there are about 150 treated and 4000 untreated individuals. After the matching the idea is to use a difference-in-differences strategy to estimate the effect of the treatment.

The problem I face at the moment is to do the matching with panel data. I am using Stata's psmatch2 command and I match on household and individual characteristics using propensity score matching. In general with panel data there will be different optimal matches at each age. As an example: if A is treated, B and C are controls, and all of them were born in 1980, then A and B may be matched in 1980 at age 0 whilst A and C are matched in 1981 at age 1 and so on. Also A may be matched with its own pre-treatment values from previous years.

To get around this issue, I took the average of all time-varying variables such that the matching can identify individuals who are on average the most similar over the duration of the sample and I do the matching separately for each age group 0 to 18. Unfortunately this still matches a different control unit to each treated unit per age group.

If someone could direct me towards a method to do pairwise matching with panel data in Stata this would be very much appreciated.

Best Answer

You basically have to create a wide format dataset with the all the characteristics that are relevant for the matching procedure, perform the matching on this cross-sectional dataset, and then use the ID to identify the matched pair in the panel dataset. Here are some more details:

  1. Use reshape to create a wide format dataset. Format the pre-treatment variables in the way you want to use them in the matching procedure. You can just take the average of your variables if you have multiple observations for one individual but you can also come up with other ways (you can also keep multiple observations of the same variables such as health1, health2 and use all of them in the matching). The goal is to have a dataset with one observation per individual.

  2. Using this dataset, perform the matching procedure with psmatch2.

  3. Merge the information about the matched cases with the original dataset. Drop cases that are not matched etc. I am not sure about the details here because I don't really know stata and psmatch2 but I think you get the idea.

Using these steps, you can match cases based on all pre-treatment information and you only have one match per treatment unit.