I ran a full match for a PSM, with the balanced output summarized below:
Call:
matchit(formula = buyout_flag ~ tsale + sfincs_avg + logprox +
tenure + age + percent_ethwhite_origin + percent_poverty_origin +
percent_hs_origin + percent_owner_origin + house_medval_origin +
percap_origin, data = hcad_floodp, method = "full", distance = "glm",
link = "probit", caliper = 0.1)
Summary of Balance for Matched Data:
Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean eCDF Max Std. Pair Dist.
distance 0.365 0.365 0.000 0.999 0.002 0.023 0.013
tsale 1.881 1.862 0.034 0.595 0.082 0.222 1.316
sfincs_avg 6.275 6.487 -0.059 0.842 0.022 0.064 0.735
logprox 3.923 3.886 0.041 0.892 0.017 0.068 0.958
tenure 13.054 13.026 0.003 0.983 0.015 0.049 1.068
age 60.878 60.423 0.030 0.971 0.017 0.043 1.148
percent_ethwhite_origin 41.500 39.986 0.062 0.953 0.024 0.054 0.947
percent_poverty_origin 15.090 15.590 -0.049 1.035 0.026 0.073 1.003
percent_hs_origin 81.379 80.997 0.021 1.083 0.026 0.088 0.941
percent_owner_origin 57.988 56.722 0.062 0.968 0.021 0.062 1.069
house_medval_origin 146647.059 145763.842 0.015 1.241 0.022 0.074 0.935
percap_origin 29349.670 28864.395 0.043 0.954 0.026 0.069 1.010
I am trying to interrogate the balance a bit more thoroughly (working through this paper/guide) using bal.tab(.,m.threshold-0.1) (which turned out fine), and bal.tab(., v.threshold=1), where I'm a bit confused on the results.
Call
matchit(formula = buyout_flag ~ tsale + sfincs_avg + logprox +
tenure + age + percent_ethwhite_origin + percent_poverty_origin +
percent_hs_origin + percent_owner_origin + house_medval_origin +
percap_origin, data = hcad_floodp, method = "full", distance = "glm",
link = "probit", caliper = 0.1)
Balance Measures
Type Diff.Adj V.Ratio.Adj V.Threshold
distance Distance 0.000 0.999
tsale Contin. 0.034 0.595 Not Balanced, >1
sfincs_avg Contin. -0.059 0.842 Not Balanced, >1
logprox Contin. 0.041 0.892 Not Balanced, >1
tenure Contin. 0.003 0.983 Not Balanced, >1
age Contin. 0.030 0.971 Not Balanced, >1
percent_ethwhite_origin Contin. 0.062 0.953 Not Balanced, >1
percent_poverty_origin Contin. -0.049 1.035 Not Balanced, >1
percent_hs_origin Contin. 0.021 1.083 Not Balanced, >1
percent_owner_origin Contin. 0.062 0.968 Not Balanced, >1
house_medval_origin Contin. 0.015 1.241 Not Balanced, >1
percap_origin Contin. 0.043 0.954 Not Balanced, >1
To me, it looks like the V.Ratio is under 1 for everything except percent_poverty percent_hs, and percent_medval. What am I missing?
Best Answer
The variance ratio ranges from 0 to infinity. Perfect balance on the variance means the variance ratio is equal to 1. A variance ratio of .5 means the same thing as a variance ratio of 2; they are just in opposite directions. So when you supply a threshold to
bal.tab()
for variance ratios, they work in both directions; that is, settingthreshold = c(v = 2)
will trigger any variance ratios that are greater than 2 or less than .5.A variance ratio of 1 means the variances are exactly equal; setting a threshold of 1 means that any variance ratio greater than 1 or less than will be triggered; any covariate that is not exactly balanced on the variances will therefore be triggered. This would be like setting a threshold on the standardized mean difference (SMD) of 0; any SMD greater than or less than 0 will be triggered. This can be helpful if you are trying to detect departures from perfect balance, but in most cases, small imbalances are okay, and the thresholds should be set slightly away from the value that indicates perfect balance. Using a variance ratio threshold of, for example, 1.25 will detect even moderate departures from perfect balance.