Solved – Data format (wide or long) for propensity Score matching using psmatch2

matchingpropensity-scoresstata

I have a panel data with sales rank of an app on appstore over a period of 11 months. Some of the apps were promoted during this period and form the treatment groups (treated=1), while others form the control group. While there are 180 treated apps, there are close to 21k controls apps. I wish to find the nearest neighbors of the treated apps from the control apps during the pre-treatment period, so that I am essentially making an apples-to-apples comparison while estimating treatment effects. I am using psmatch2 package from Stata, however, I am confused if this package requires long or wide data format (I have found contradicting examples on internet), and the psmatch2 documentation itself does not mention anything in specific.

Propensity score matching with panel data answer suggests using wide format in general, however the author of answer is not familiar with psmatch2.

So, what format should I store my data in, wide or long?

Any help would be greatly appreciated.

Thanks!

Best Answer

You will want the data in wide format. The idea is to match on the pre-treatment outcomes for each month as if it was just another X variable.

However, it sounds like apps get treated at different times, so this sort of data structure will not work very well because the number of pre-treatments periods and the number of matching variables will be different for each app. psmatch2 cannot handle that easily.

Also, you might worry about violations of the Stable Unit Treatment Assumption. If an app gets treated, it alters the potential outcome for the control apps for which it is a substitute or a complement since fewer/more of them will be downloaded. In this setting, matching (and all other partial equilibrium estimators that rely on SUTVA) may perform poorly.