R Propensity Scores – Discrepancy in Calculating SMD Between CreateTableOne and Cobalt R Packages

matchingpropensity-scoresr

I was attempting to calculate standardized mean differences (SMDs) after performing propensity score matching to verify that balance was achieved, however was running into some discrepancies between packages.

I tried two approaches to assess balance using both the CreateTableOne package as well as the cobalt package in R however I was getting differing results and wanted to ask the community if anyone had any suggestions on what may be causing the discrepancy.

Here is the data BEFORE performing any matching using CreateTableOne:

> print(CreateTableOne(vars = variables_for_table, strata = "treated", data = working_data), quote = FALSE, noSpaces = TRUE, smd = TRUE)

                                       Stratified by treated
                                          No            Yes           p     test SMD  
  n                                       198           41                            
  age_at_diagnosis (mean (SD))            64.37 (10.63) 62.40 (14.87) 0.316      0.153
  charlson_score (mean (SD))              1.93 (2.19)   3.10 (3.81)   0.008      0.374
  median_wbc_6mo (mean (SD))              9.31 (3.98)   8.62 (4.23)   0.317      0.169
  chemotherapy = Yes (%)                  113 (57.1)    29 (70.7)     0.148      0.287
  radiation = Yes (%)                     117 (59.1)    26 (63.4)     0.735      0.089
  smoking = Yes (%)                       70 (35.4)     22 (53.7)     0.044      0.375
  alcohol = Yes (%)                       6 (3.0)       2 (4.9)       0.903      0.095
  myocardial_infarction = Yes (%)         3 (1.5)       2 (4.9)       0.441      0.192
  congestive_heart_failure = Yes (%)      5 (2.5)       2 (4.9)       0.761      0.125
  peripheral_vascular_disease = Yes (%)   8 (4.0)       2 (4.9)       1.000      0.041
  cerebrovascular_disease = Yes (%)       21 (10.6)     8 (19.5)      0.185      0.251
  dementia = Yes (%)                      1 (0.5)       1 (2.4)       0.768      0.161
  chronic_pulmonary_disease = Yes (%)     18 (9.1)      4 (9.8)       1.000      0.023
  rheumatic_disease = Yes (%)             8 (4.0)       3 (7.3)       0.616      0.142
  mild_liver_disease = Yes (%)            4 (2.0)       4 (9.8)       0.042      0.333
  diabetes_without_complication = Yes (%) 16 (8.1)      4 (9.8)       0.966      0.059
  diabetes_with_complication = Yes (%)    1 (0.5)       1 (2.4)       0.768      0.161
  hemiplegia_or_paraplegia = Yes (%)      5 (2.5)       1 (2.4)       1.000      0.006
  renal_disease = Yes (%)                 5 (2.5)       3 (7.3)       0.282      0.223
  malignancy = Yes (%)                    17 (8.6)      3 (7.3)       1.000      0.047
  metastatic_cancer = Yes (%)             8 (4.0)       6 (14.6)      0.024      0.370
  hiv_or_aids = Yes (%)                   0 (0.0)       1 (2.4)       0.383      0.224

Here it is AFTER performing matching using the MatchIt package and assessing balance again with CreateTableOne:

> formula = treated~ age_at_diagnosis + charlson_score + median_wbc_6mo + chemotherapy + radiation + smoking + alcohol + myocardial_infarction + congestive_heart_failure + peripheral_vascular_disease + cerebrovascular_disease + dementia + chronic_pulmonary_disease + rheumatic_disease + mild_liver_disease + diabetes_without_complication + diabetes_with_complication + hemiplegia_or_paraplegia + renal_disease + malignancy  + metastatic_cancer + hiv_or_aids
> matched_data = matchit(formula, data = working_data, distance = "glm", method = "nearest", replace = FALSE, ratio = 4, caliper = 0.3)
> matched_data = match.data(matched_data)

> print(CreateTableOne(vars = variables_for_table, strata = "treated", data = matched_data), quote = FALSE, noSpaces = TRUE, smd = TRUE)

                                        Stratified by treated
                                          No            Yes           p     test SMD   
  n                                       106           34                             
  age_at_diagnosis (mean (SD))            62.58 (11.14) 62.17 (15.04) 0.865      0.031 
  charlson_score (mean (SD))              1.88 (1.98)   1.91 (1.68)   0.927      0.019 
  median_wbc_6mo (mean (SD))              9.23 (4.08)   9.00 (4.46)   0.775      0.055 
  chemotherapy = Yes (%)                  69 (65.1)     23 (67.6)     0.948      0.054 
  radiation = Yes (%)                     64 (60.4)     21 (61.8)     1.000      0.028 
  smoking = Yes (%)                       47 (44.3)     17 (50.0)     0.705      0.114 
  alcohol = Yes (%)                       4 (3.8)       2 (5.9)       0.967      0.098 
  myocardial_infarction = Yes (%)         3 (2.8)       2 (5.9)       0.762      0.150 
  congestive_heart_failure = Yes (%)      2 (1.9)       1 (2.9)       1.000      0.069 
  peripheral_vascular_disease = Yes (%)   6 (5.7)       2 (5.9)       1.000      0.010 
  cerebrovascular_disease = Yes (%)       14 (13.2)     6 (17.6)      0.717      0.123 
  dementia = Yes (%)                      0 (0.0)       0 (0.0)       NaN        <0.001
  chronic_pulmonary_disease = Yes (%)     12 (11.3)     3 (8.8)       0.927      0.083 
  rheumatic_disease = Yes (%)             3 (2.8)       1 (2.9)       1.000      0.007 
  mild_liver_disease = Yes (%)            3 (2.8)       1 (2.9)       1.000      0.007 
  diabetes_without_complication = Yes (%) 8 (7.5)       3 (8.8)       1.000      0.047 
  diabetes_with_complication = Yes (%)    0 (0.0)       0 (0.0)       NaN        <0.001
  hemiplegia_or_paraplegia = Yes (%)      1 (0.9)       1 (2.9)       0.981      0.145 
  renal_disease = Yes (%)                 4 (3.8)       1 (2.9)       1.000      0.046 
  malignancy = Yes (%)                    7 (6.6)       1 (2.9)       0.707      0.172 
  metastatic_cancer = Yes (%)             3 (2.8)       1 (2.9)       1.000      0.007 
  hiv_or_aids = No (%)                    106 (100.0)   34 (100.0)    NA         <0.001

I would like to draw your attention to the variable "malignancy" (third row from the bottom). After matching, in the treatment group the prevalence of malignancy is 2.9% compared to 6.6% in the non-treated group, resulting in an SMD of 0.172 per CreateTableOne. If, however, we assess balance by passing the MatchIt object directly to the cobalt package as below we get a different value of SMD for malignancy.

> bal.tab(matched_data, un=TRUE, addl = addl, binary = "std", m.threshold = 0.1)

Call
 matchit(formula = formula, data = working_data, method = "nearest", 
    distance = "glm", replace = FALSE, caliper = 0.3, ratio = 4)

Balance Measures
                                      Type Diff.Un Diff.Adj        M.Threshold
distance                          Distance  0.6876   0.0271     Balanced, <0.1
age_at_diagnosis                   Contin. -0.1329   0.0006     Balanced, <0.1
charlson_score                     Contin.  0.3051  -0.0540     Balanced, <0.1
median_wbc_6mo                     Contin. -0.1637  -0.0555     Balanced, <0.1
chemotherapy_Yes                    Binary  0.3002   0.0269     Balanced, <0.1
radiation_Yes                       Binary  0.0898   0.0051     Balanced, <0.1
smoking_Yes                         Binary  0.3671  -0.0147     Balanced, <0.1
alcohol_Yes                         Binary  0.0858   0.1252 Not Balanced, >0.1
myocardial_infarction_Yes           Binary  0.1561  -0.0683     Balanced, <0.1
congestive_heart_failure_Yes        Binary  0.1092   0.0683     Balanced, <0.1
peripheral_vascular_disease_Yes     Binary  0.0389   0.0228     Balanced, <0.1
cerebrovascular_disease_Yes         Binary  0.2247   0.0742     Balanced, <0.1
dementia_Yes                        Binary  0.1254   0.0000     Balanced, <0.1
chronic_pulmonary_disease_Yes       Binary  0.0224  -0.1569 Not Balanced, >0.1
rheumatic_disease_Yes               Binary  0.1258   0.0000     Balanced, <0.1
mild_liver_disease_Yes              Binary  0.2607  -0.1239 Not Balanced, >0.1
diabetes_without_complication_Yes   Binary  0.0565  -0.0661     Balanced, <0.1
diabetes_with_complication_Yes      Binary  0.1254   0.0000     Balanced, <0.1
hemiplegia_or_paraplegia_Yes        Binary -0.0056   0.1430 Not Balanced, >0.1
renal_disease_Yes                   Binary  0.1840  -0.0565     Balanced, <0.1
malignancy_Yes                      Binary -0.0487  -0.0941     Balanced, <0.1
metastatic_cancer_Yes               Binary  0.2997  -0.0485     Balanced, <0.1
hiv_or_aids_Yes                     Binary  0.1581   0.0000     Balanced, <0.1

Balance tally for mean differences
                   count
Balanced, <0.1        19
Not Balanced, >0.1     4

Variable with the greatest mean difference
                      Variable Diff.Adj        M.Threshold
 chronic_pulmonary_disease_Yes  -0.1569 Not Balanced, >0.1

Sample sizes
                     Control Treated
All                   198.        41
Matched (ESS)          83.07      34
Matched (Unweighted)  106.        34
Unmatched              92.         7

As is seen above, the SMD of malignancy after matching as calculated by the cobalt package is .0941, which is below the commonly accepted threshold of 0.1 and thus considered to be balanced. However, CreateTableOne reports an SMD of 0.172 which is above the threshold and thus not balanced. Looking at the data, the prevalence of malignancy in the treated group is only 2.9% vs 6.6% in the non-treated group which makes me think the covariate may not be balanced as the SMD from CreateTableOne suggests.

A similar discrepancy is observed in "mild_liver_disease" in which after matching there is a prevalence of 2.9% in the treated group vs. 2.8% in the non-treated group resulting in an SMD of 0.007 per CreateTableOne (indicating balance) but an SMD of .1239 in cobalt (indicating lack of balance).

  • What could be causing these discrepancies and which package is better
    to trust when assessing balance after matching?
  • Am I missing something in the interpretation?
  • Any insight or suggestions would be extremely appreciated!

Best Answer

There are two reasons why these values differ. The reason the pre-matching values differ is because of how cobalt and tableone compute the denominator of the standardized mean difference. tableone uses $\sqrt{\frac{s_1^2 + s_0^2}{2}}$ in the denominator of the SMD, whereas cobalt uses $s_1$ in the denominator (where $s_1$ and $s_0$ are the standard deviations of the covariate in the treated and control groups). This option can be changed in cobalt; you can set s.d.denom = "pooled" to use the tableone version. cobalt chooses the default standardization factor based on the estimand supplied to matchit(), which in this case is the ATT, which suggests the treated group is the target population, so the standardization factor should reflect that. See my answer here for some information on that choice. In the end, it doesn't matter too much and results usually won't differ unless there is severe imbalance in the variances of the two groups.

The reason the two results differ after matching is that you failed to include the matching weights in the balance statistics for tableone. Because you did 4:1 matching with a caliper, not all treated units received 4 matches. Some received 3, some 2, some 1, and some none at all. In this case, matched control units receive different weights depending on how many other control units were matched to their treated unit. For example, if a treated unit only received one matched control unit (because all others were outside the caliper or had already been matched), that control unit would receive a weight of 1, but if a treated unit received four matched control units, each matched control unit would receive a weight of 1/4. The weights are necessary for assessing balance and for use in estimating the treatment effect. cobalt automatically extracts the weights from the matchit object and includes them in computing the SMD; tableone does not unless you supply the weights manually using svyCreateTableOne(). Even if you use svyCreateTableOne(), the SMDs will not be calculated correctly because they will use the weighted variance in the calculations, which is inappropriate. See my answer here for more detail about that.

You should use cobalt for assessing balance. tableone is great for making nice tables, but there has not been as much care put into making sure balance statistics are computed correctly and consistently for a variety of circumstances because that is not what the package was designed for, whereas cobalt was designed specifically for assessing balance after using MatchIt and other packages.

Related Question