Solved – Propensity Score Matching – Unbalanced Sample

heckmanleast squarespropensity-scoresregression

I have a question regarding PSM. I'm just starting to dive into this topic but I reached a point where I think external help is necessary.

In my regression analysis (OLS), I have an independent variable, which is a dummy to disclose a annual report or not (1/0). I want to study the characterics of the treated group (1) on my dependent variable. I know that this dummy in my regression is not exogenouse since there are published papers analysing the determinants on this dummy, so I found PSM to be a helpful solution to this problem. Here comes my concern:

I have a rather small panel data set with 170 observations. Furthermore, it is unbalanced. 100 companies that do disclose (=1) and 70 that do not disclose (=0). Matching (stata command: psmmatch) now offers me a solution with 140 observations left. That means it matches a treated firm to every non-treated firm in my sample. Actually from how I understood PSM that is the wrong way around. In addition, with this procedure I loose 30 oberservations of my treated group, which potentially have an influence in the final regression. In my opinion this is a seriouse concern and PSM can't be used in this special case where I have more non-treated observation than treated.

I hope my thoughts were cleary expressed and somebody has a hint for me how I can proceed, what literature I can look at or if this is even a major problem.

A friend recommended Heckman procedure, which I think is not approriate since it only controls for unobservable characteristics. I my case I know the determinants on my dummy from previouse literature.

I am looking forward to your replies. Please do not hestitate if you need further information.

Kind regards

Grassi

Best Answer

Yes, if you choose 1:1 matching ratio, you can only have 70 matched observations, or even less if you exclude some matched observations basing on the digit/caliper (because they were matched poorly).

For your concern on the sample loss, this paper discussed the potential impact https://www.ncbi.nlm.nih.gov/pubmed/28376195

You may need to know the characteristics of the 30 excluded observations and to compare with the included 70 observations

Related Question