Solved – Manipulating data for propensity score matching following multiple imputation with mice package

matchingmicemultiple-imputationpropensity-scoresr

I've completed multiple imputation of my dataset for the first time using the mice package in R. I'm familiar with the procedure for using the MatchIt package for propensity score matching using a normal dataframe. However, I'm not at all familiar with how to manipulate the mids object which mice outputs after imputation. I looked over a couple relevant threads but the questions being answered are more broad scope (I suppose my questions is more just about the code and functions used).

I don't know that MatchIt or Matching are capable of making use of the mids object, and I'm not sure how to bridge the divide. I could use with.mids() to fit a model for the propensity score, but I'm uncertain how I would calculate the actual propensity scores using the mira object and move on from there. Alternatively, I could output a long form dataframe with the imputed data sets stacked onto one another using complete(imp, "long"), calculate propensity scores for each, average them, and complete matching. However, my outcome variable is imputed, so it's not clear that would be beneficial.

Best Answer

Your second method, using complete(imp, "long") is what you need to do. You've probably read the Mitra & Reiter paper distinguishing between the "Within" and "Across" methods, since you mentioned wanting to average propensity scores. With imputed outcome data, you need to use the "Within" approach, and then combine and adjust your estimates using Rubin's rules. So, for each imputed data set, you will do the following: run matchit(), and then run your outcome regression model on the matched data set. Once you have done this for each imputed data set, you will recombine the model fits using as.mira() and then summarize your mira output. Here is some sample code:

imp.data <- complete(imp, "long")
fit.list <- setNames(vector("list", length(unique(imp.data$.imp))),
                     unique(imp.data$.imp))
for (i in unique(imp.data2$.imp)) {
  m.out <- matchit(t ~ v1 + v2 + v3, data = imp.data[imp.data$.imp == i,])
  fit.list[[i]] <- glm(y ~ t, data = match.data(m.out))
}
fit.list.mira <- as.mira(fit.list) #combines into mira object for pool()
summary(pool(fit.list.mira))

Hope this helps!