Solved – Categorical variable as control variable in MatchIt

categorical datamatchingpropensity-scoresr

I'm kind of new to R and trying to run propensity score matching using the MatchIt module.
Some of my control variables are continous but some of them are categorical. For example, I have a "currency" variable that contains multiple currencies. I put all the variabes as controls while calling to MatchIt, but I'm not sure it was right…
The summary of the matchshow me the following differences for the currency variable:
Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean eQQ Max
CurrencyCAD 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
CurrencyEUR 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
CurrencyGBP 0.1509 0.1509 0.3614 0.0000 0.0000 0.0000 0.0000
CurrencyNZD 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
CurrencyUSD 0.8302 0.8302 0.3791 0.0000 0.0000 0.0000 0.0000
for another categorical variable ("Hobby"), it showed me values other than 0 or 100.. what does it mean?

HobbyPhotography 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
HobbyDance 0.0189 0.0000 0.0000 0.0189 0.0000 0.0189 1.0000
HobbyTechnology 0.0189 0.0377 0.1924 -0.0189 0.0000 0.0189 1.0000

In addition, I have some binary variables of 0/1 (e.g. HadParticipated) which I inserted to the match controls.. is that right..?
The difference I got for one of them is as following:
HadParticipated 0.8868 0.8679 0.3418 0.0189 0.0000 0.0189 1.0000

I'm not sure what is the best way of inserting those variables as controls to the match… any help..? Thanks!!!

Best Answer

These numbers may make sense given your dataset. The 0's for the treatment and control groups' means for CurrencyCAD, CurrencyEUR, CurrencyNZD, and HobbyPhotograpy should just mean that those levels are not present in the matched cohort.

From your post, I'm guessing matchit is creating the dummy coded level variables for you (like you did manually for HadParticipated). Is there a level of Currency and a level of Hobby that are not shown in your post? The means should be the proportion in that category for the matched cohort in a given arm, so those means need to sum to one over all the categories for a variable.

Related Question