Solved – How to do PSM with panel data using PanelMatch

difference-in-differencematchingpropensity-scoresr

I would greatly appreciate if you could let me know how to use PanelMatch for my dataset. Unfortunately, I couldn't find it's manual so I don't know how to find which firms are matched, how to extract the coefficients of the estimated models, how to report bias before and after matching, and etc..

  1. First, I need to do PSM using these variables:

switch =big4+ lnasset+ leverage+ loss

  1. Then, I should do diff in diff on the matched sample:

decost= switch+ post_switch +switch*post_switch+ lnaudten +big4 +altmanz +lnasset +lnage +markettobook+ leverage +profit+ tangible+ cashvol

I also read this document in Stata. However, in my dataset, the treatment dates are different for each firm. Besides, the treatment could occur more than once for each firm. Therefore, I don’t know how to define "post_switch".

id date lnaudten big4 altmanz lnasset lnage    mtob     lev    prof   tang   cavol  switch decost los
1  86  .693147    0   18.4373 12.4689 2.48491 3.69137 .051575 .44427  .999581 .195047  0 .205964  0
1  87  1.09861    0   12.5244 12.7628 2.56495 2.69891 .043572 .559291 .999688 .128583  0 .107817  0
1  88  1.38629    0   14.7922 13.3187 2.63906 3.55144 .037377 .901665 .99897  .045367  0 .085176  0
1  89  1.60944    0   21.6806 13.5282 2.70805 4.4521  .090386 1.00277 .998904 .034365  0 .059932  0
1  90  1.79176    0   16.6034 13.7204 2.77259 3.16585 .077934 1.21371 .999292 .032229  0 .064589  0
1  91  0          0   9.32285 14.0652 2.83321 1.87682 .038984 1.61792 .999376 .019715  1 .086323  0
1  92  .693147    0   29.1306 14.3805 2.89037 3.83173 .030874 3.42558 .999687 .117503  0 .148985  0
1  93  1.09861    0   23.7929 14.5855 2.94444 3.08877 .01225  4.19413 .999862 .171374  0 .181363  0
2  86  1.94591    1   2.67142 13.5351 1.60944 .90438  .031392 .284566 .997711 .172729  0 .116186  0
2  87  2.07944    1   1.85554 13.6068 1.79176 .783169 .037099 .28575  .997862 .055812  0 .137087  0
2  88  2.19723    1   3.25227 13.6162 1.94591 .857463 .046493 .264266 .99788  .052991  0 .174771  0
2  89  2.30258    1   2.46358 13.8247 2.07944 1.00449 .045589 .246997 .998208 .064097  0 .168786  0
2  90  2.3979     1   1.43551 13.8304 2.19723 .791431 .060575 .171494 .998218 .062911  0 .240464  0
2  91  0          0   1.10687 13.7423 2.30258 .532189 .071249 .164944 .998054 .093181  1 .351773  0
2  92  .693147    0   3.39252 13.8668 2.3979  1.80869 .121138 .177533 .998281 .090341  0 .282046  0
2  93  1.09861    0   3.95825 14.0244 2.48491 1.41083 .094626 .162305 .99847  .134091  0 .188627  0
3  86  .693147    0   5.01935 13.0392 3.49651 1.08849 .008833 .275658 .995814 .165765  0 .12684   0
3  87  1.09861    0   8.51978 13.0429 3.52636 .794968 .010574 .349996 .995351 .276396  0 2.49701  0
3  88  1.38629    0   13.1943 13.2777 3.55535 1.36713 .043884 .409195 .996392 .079824  0 .033575  0
3  89  1.60944    0   18.7427 13.4562 3.58352 1.89782 .010373 .42366  .997045 .049833  0 .057621  0
3  90  1.79176    0   20.2185 13.4667 3.61092 1.69264 .016154 .339384 .997148 .133837  0 .133177  0
3  91  0          0   11.1153 13.9098 3.63759 1.50931 .010464 .935899 .998216 .12095   1 .089572  0
3  92  .693147    0   25.7134 14.1341 3.66356 2.41058 .004609 1.06214 .99856  .13175   0 .171943  0
3  93  1.09861    0   29.8983 14.162  3.68888 2.29729 .003891 .902802 .997648 .146949  0 .823985  0

Best Answer

This is how I would do it. Please see the questions and comment I left above.

Based on the question it seemed like the choice of the newer non-CRAN panel matching library PanelMatch, while interesting, seemed to require information/data not in your question for time-series specific use cases of PSM.

It sounded like you're in the more general case, wherein you'd want a plain PSM/matching package like Matching or FastMatch, though if this assumption is incorrect please let me know and provide more info on your needs.

Ok so first, load the libraries and data:

#devtools::install_github("insongkim/PanelMatch", dependencies=TRUE)

if ( !require(pacman) ) install.packages("pacman");require(pacman)
p_load(Matching,speedglm) # PanelMatch

data <- read.table(text="id date lnaudten big4 altmanz lnasset lnage    mtob     lev    prof   tang   cavol  switch decost los
1  86  .693147    0   18.4373 12.4689 2.48491 3.69137 .051575 .44427  .999581 .195047  0 .205964  0
                   1  87  1.09861    0   12.5244 12.7628 2.56495 2.69891 .043572 .559291 .999688 .128583  0 .107817  0
                   1  88  1.38629    0   14.7922 13.3187 2.63906 3.55144 .037377 .901665 .99897  .045367  0 .085176  0
                   1  89  1.60944    0   21.6806 13.5282 2.70805 4.4521  .090386 1.00277 .998904 .034365  0 .059932  0
                   1  90  1.79176    0   16.6034 13.7204 2.77259 3.16585 .077934 1.21371 .999292 .032229  0 .064589  0
                   1  91  0          0   9.32285 14.0652 2.83321 1.87682 .038984 1.61792 .999376 .019715  1 .086323  0
                   1  92  .693147    0   29.1306 14.3805 2.89037 3.83173 .030874 3.42558 .999687 .117503  0 .148985  0
                   1  93  1.09861    0   23.7929 14.5855 2.94444 3.08877 .01225  4.19413 .999862 .171374  0 .181363  0
                   2  86  1.94591    1   2.67142 13.5351 1.60944 .90438  .031392 .284566 .997711 .172729  0 .116186  0
                   2  87  2.07944    1   1.85554 13.6068 1.79176 .783169 .037099 .28575  .997862 .055812  0 .137087  0
                   2  88  2.19723    1   3.25227 13.6162 1.94591 .857463 .046493 .264266 .99788  .052991  0 .174771  0
                   2  89  2.30258    1   2.46358 13.8247 2.07944 1.00449 .045589 .246997 .998208 .064097  0 .168786  0
                   2  90  2.3979     1   1.43551 13.8304 2.19723 .791431 .060575 .171494 .998218 .062911  0 .240464  0
                   2  91  0          0   1.10687 13.7423 2.30258 .532189 .071249 .164944 .998054 .093181  1 .351773  0
                   2  92  .693147    0   3.39252 13.8668 2.3979  1.80869 .121138 .177533 .998281 .090341  0 .282046  0
                   2  93  1.09861    0   3.95825 14.0244 2.48491 1.41083 .094626 .162305 .99847  .134091  0 .188627  0
                   3  86  .693147    0   5.01935 13.0392 3.49651 1.08849 .008833 .275658 .995814 .165765  0 .12684   0
                   3  87  1.09861    0   8.51978 13.0429 3.52636 .794968 .010574 .349996 .995351 .276396  0 2.49701  0
                   3  88  1.38629    0   13.1943 13.2777 3.55535 1.36713 .043884 .409195 .996392 .079824  0 .033575  0
                   3  89  1.60944    0   18.7427 13.4562 3.58352 1.89782 .010373 .42366  .997045 .049833  0 .057621  0
                   3  90  1.79176    0   20.2185 13.4667 3.61092 1.69264 .016154 .339384 .997148 .133837  0 .133177  0
                   3  91  0          0   11.1153 13.9098 3.63759 1.50931 .010464 .935899 .998216 .12095   1 .089572  0
                   3  92  .693147    0   25.7134 14.1341 3.66356 2.41058 .004609 1.06214 .99856  .13175   0 .171943  0
                   3  93  1.09861    0   29.8983 14.162  3.68888 2.29729 .003891 .902802 .997648 .146949  0 .823985  0",
                   header = T)

head(data)

I am taking the PS equation from your question, but normally I use the MatchBalance() function and its statistical tests to define the PS model specification

Your equation mentioned leverage and loss, but it's missing from the data, so I will exclude that below.

Here's the propensity score (PS) model:

form <- as.formula("switch ~ big4 + lnasset")

mod  <- speedglm::speedglm(
  form,
  family=binomial(),
  fitted=T,
  data = data
)
summary(mod) # note poor fit, but I will ignore this for the example

OK, now extract the propensity scores:

data$fitted.values <- predict(mod)

Now do matching, and calculate quasi-experimental statistics, like Average effect of Treatment on the Treated (ATT) or the ATE:

set.seed(1) # set a random seed 
atta <- Match(  Y        = data$decost, # I assume this is the outcome 
            Tr       = data$switch, # Treatment/Control indicator
                X        = data$fitted.values, # PS's
                estimand = "ATT", # Outcome metric
                M        = 1, # 1-to-1 or 1-to-many matching
                ties     = F,#T, # T = VERY SLOW but higher quality
                replace  = TRUE,
                exact    = T,
                version  = "fast" )
summary(atta)  #

That gives you your result. You should also do post hoc testing to make sure that treatment and control are NOT significantly different on any control variables.