R Propensity Scores – Discrepancy in Calculating SMD Between CreateTableOne and Cobalt R Packages

matchingpropensity-scoresr

I was attempting to calculate standardized mean differences (SMDs) after performing propensity score matching to verify that balance was achieved, however was running into some discrepancies between packages.

I tried two approaches to assess balance using both the CreateTableOne package as well as the cobalt package in R however I was getting differing results and wanted to ask the community if anyone had any suggestions on what may be causing the discrepancy.

Here is the data BEFORE performing any matching using CreateTableOne:

> print(CreateTableOne(vars = variables_for_table, strata = "treated", data = working_data), quote = FALSE, noSpaces = TRUE, smd = TRUE)

                                       Stratified by treated
                                          No            Yes           p     test SMD  
  n                                       198           41                            
  age_at_diagnosis (mean (SD))            64.37 (10.63) 62.40 (14.87) 0.316      0.153
  charlson_score (mean (SD))              1.93 (2.19)   3.10 (3.81)   0.008      0.374
  median_wbc_6mo (mean (SD))              9.31 (3.98)   8.62 (4.23)   0.317      0.169
  chemotherapy = Yes (%)                  113 (57.1)    29 (70.7)     0.148      0.287
  radiation = Yes (%)                     117 (59.1)    26 (63.4)     0.735      0.089
  smoking = Yes (%)                       70 (35.4)     22 (53.7)     0.044      0.375
  alcohol = Yes (%)                       6 (3.0)       2 (4.9)       0.903      0.095
  myocardial_infarction = Yes (%)         3 (1.5)       2 (4.9)       0.441      0.192
  congestive_heart_failure = Yes (%)      5 (2.5)       2 (4.9)       0.761      0.125
  peripheral_vascular_disease = Yes (%)   8 (4.0)       2 (4.9)       1.000      0.041
  cerebrovascular_disease = Yes (%)       21 (10.6)     8 (19.5)      0.185      0.251
  dementia = Yes (%)                      1 (0.5)       1 (2.4)       0.768      0.161
  chronic_pulmonary_disease = Yes (%)     18 (9.1)      4 (9.8)       1.000      0.023
  rheumatic_disease = Yes (%)             8 (4.0)       3 (7.3)       0.616      0.142
  mild_liver_disease = Yes (%)            4 (2.0)       4 (9.8)       0.042      0.333
  diabetes_without_complication = Yes (%) 16 (8.1)      4 (9.8)       0.966      0.059
  diabetes_with_complication = Yes (%)    1 (0.5)       1 (2.4)       0.768      0.161
  hemiplegia_or_paraplegia = Yes (%)      5 (2.5)       1 (2.4)       1.000      0.006
  renal_disease = Yes (%)                 5 (2.5)       3 (7.3)       0.282      0.223
  malignancy = Yes (%)                    17 (8.6)      3 (7.3)       1.000      0.047
  metastatic_cancer = Yes (%)             8 (4.0)       6 (14.6)      0.024      0.370
  hiv_or_aids = Yes (%)                   0 (0.0)       1 (2.4)       0.383      0.224

Here it is AFTER performing matching using the MatchIt package and assessing balance again with CreateTableOne:

> formula = treated~ age_at_diagnosis + charlson_score + median_wbc_6mo + chemotherapy + radiation + smoking + alcohol + myocardial_infarction + congestive_heart_failure + peripheral_vascular_disease + cerebrovascular_disease + dementia + chronic_pulmonary_disease + rheumatic_disease + mild_liver_disease + diabetes_without_complication + diabetes_with_complication + hemiplegia_or_paraplegia + renal_disease + malignancy  + metastatic_cancer + hiv_or_aids
> matched_data = matchit(formula, data = working_data, distance = "glm", method = "nearest", replace = FALSE, ratio = 4, caliper = 0.3)
> matched_data = match.data(matched_data)

> print(CreateTableOne(vars = variables_for_table, strata = "treated", data = matched_data), quote = FALSE, noSpaces = TRUE, smd = TRUE)

                                        Stratified by treated
                                          No            Yes           p     test SMD   
  n                                       106           34                             
  age_at_diagnosis (mean (SD))            62.58 (11.14) 62.17 (15.04) 0.865      0.031 
  charlson_score (mean (SD))              1.88 (1.98)   1.91 (1.68)   0.927      0.019 
  median_wbc_6mo (mean (SD))              9.23 (4.08)   9.00 (4.46)   0.775      0.055 
  chemotherapy = Yes (%)                  69 (65.1)     23 (67.6)     0.948      0.054 
  radiation = Yes (%)                     64 (60.4)     21 (61.8)     1.000      0.028 
  smoking = Yes (%)                       47 (44.3)     17 (50.0)     0.705      0.114 
  alcohol = Yes (%)                       4 (3.8)       2 (5.9)       0.967      0.098 
  myocardial_infarction = Yes (%)         3 (2.8)       2 (5.9)       0.762      0.150 
  congestive_heart_failure = Yes (%)      2 (1.9)       1 (2.9)       1.000      0.069 
  peripheral_vascular_disease = Yes (%)   6 (5.7)       2 (5.9)       1.000      0.010 
  cerebrovascular_disease = Yes (%)       14 (13.2)     6 (17.6)      0.717      0.123 
  dementia = Yes (%)                      0 (0.0)       0 (0.0)       NaN        <0.001
  chronic_pulmonary_disease = Yes (%)     12 (11.3)     3 (8.8)       0.927      0.083 
  rheumatic_disease = Yes (%)             3 (2.8)       1 (2.9)       1.000      0.007 
  mild_liver_disease = Yes (%)            3 (2.8)       1 (2.9)       1.000      0.007 
  diabetes_without_complication = Yes (%) 8 (7.5)       3 (8.8)       1.000      0.047 
  diabetes_with_complication = Yes (%)    0 (0.0)       0 (0.0)       NaN        <0.001
  hemiplegia_or_paraplegia = Yes (%)      1 (0.9)       1 (2.9)       0.981      0.145 
  renal_disease = Yes (%)                 4 (3.8)       1 (2.9)       1.000      0.046 
  malignancy = Yes (%)                    7 (6.6)       1 (2.9)       0.707      0.172 
  metastatic_cancer = Yes (%)             3 (2.8)       1 (2.9)       1.000      0.007 
  hiv_or_aids = No (%)                    106 (100.0)   34 (100.0)    NA         <0.001

I would like to draw your attention to the variable "malignancy" (third row from the bottom). After matching, in the treatment group the prevalence of malignancy is 2.9% compared to 6.6% in the non-treated group, resulting in an SMD of 0.172 per CreateTableOne. If, however, we assess balance by passing the MatchIt object directly to the cobalt package as below we get a different value of SMD for malignancy.

> bal.tab(matched_data, un=TRUE, addl = addl, binary = "std", m.threshold = 0.1)

Call
 matchit(formula = formula, data = working_data, method = "nearest", 
    distance = "glm", replace = FALSE, caliper = 0.3, ratio = 4)

Balance Measures
                                      Type Diff.Un Diff.Adj        M.Threshold
distance                          Distance  0.6876   0.0271     Balanced, <0.1
age_at_diagnosis                   Contin. -0.1329   0.0006     Balanced, <0.1
charlson_score                     Contin.  0.3051  -0.0540     Balanced, <0.1
median_wbc_6mo                     Contin. -0.1637  -0.0555     Balanced, <0.1
chemotherapy_Yes                    Binary  0.3002   0.0269     Balanced, <0.1
radiation_Yes                       Binary  0.0898   0.0051     Balanced, <0.1
smoking_Yes                         Binary  0.3671  -0.0147     Balanced, <0.1
alcohol_Yes                         Binary  0.0858   0.1252 Not Balanced, >0.1
myocardial_infarction_Yes           Binary  0.1561  -0.0683     Balanced, <0.1
congestive_heart_failure_Yes        Binary  0.1092   0.0683     Balanced, <0.1
peripheral_vascular_disease_Yes     Binary  0.0389   0.0228     Balanced, <0.1
cerebrovascular_disease_Yes         Binary  0.2247   0.0742     Balanced, <0.1
dementia_Yes                        Binary  0.1254   0.0000     Balanced, <0.1
chronic_pulmonary_disease_Yes       Binary  0.0224  -0.1569 Not Balanced, >0.1
rheumatic_disease_Yes               Binary  0.1258   0.0000     Balanced, <0.1
mild_liver_disease_Yes              Binary  0.2607  -0.1239 Not Balanced, >0.1
diabetes_without_complication_Yes   Binary  0.0565  -0.0661     Balanced, <0.1
diabetes_with_complication_Yes      Binary  0.1254   0.0000     Balanced, <0.1
hemiplegia_or_paraplegia_Yes        Binary -0.0056   0.1430 Not Balanced, >0.1
renal_disease_Yes                   Binary  0.1840  -0.0565     Balanced, <0.1
malignancy_Yes                      Binary -0.0487  -0.0941     Balanced, <0.1
metastatic_cancer_Yes               Binary  0.2997  -0.0485     Balanced, <0.1
hiv_or_aids_Yes                     Binary  0.1581   0.0000     Balanced, <0.1

Balance tally for mean differences
                   count
Balanced, <0.1        19
Not Balanced, >0.1     4

Variable with the greatest mean difference
                      Variable Diff.Adj        M.Threshold
 chronic_pulmonary_disease_Yes  -0.1569 Not Balanced, >0.1

Sample sizes
                     Control Treated
All                   198.        41
Matched (ESS)          83.07      34
Matched (Unweighted)  106.        34
Unmatched              92.         7

As is seen above, the SMD of malignancy after matching as calculated by the cobalt package is .0941, which is below the commonly accepted threshold of 0.1 and thus considered to be balanced. However, CreateTableOne reports an SMD of 0.172 which is above the threshold and thus not balanced. Looking at the data, the prevalence of malignancy in the treated group is only 2.9% vs 6.6% in the non-treated group which makes me think the covariate may not be balanced as the SMD from CreateTableOne suggests.

A similar discrepancy is observed in "mild_liver_disease" in which after matching there is a prevalence of 2.9% in the treated group vs. 2.8% in the non-treated group resulting in an SMD of 0.007 per CreateTableOne (indicating balance) but an SMD of .1239 in cobalt (indicating lack of balance).

What could be causing these discrepancies and which package is better
to trust when assessing balance after matching?
Am I missing something in the interpretation?
Any insight or suggestions would be extremely appreciated!

Best Answer

There are two reasons why these values differ. The reason the pre-matching values differ is because of how cobalt and tableone compute the denominator of the standardized mean difference. tableone uses $\sqrt{\frac{s_1^2 + s_0^2}{2}}$ in the denominator of the SMD, whereas cobalt uses $s_1$ in the denominator (where $s_1$ and $s_0$ are the standard deviations of the covariate in the treated and control groups). This option can be changed in cobalt; you can set s.d.denom = "pooled" to use the tableone version. cobalt chooses the default standardization factor based on the estimand supplied to matchit(), which in this case is the ATT, which suggests the treated group is the target population, so the standardization factor should reflect that. See my answer here for some information on that choice. In the end, it doesn't matter too much and results usually won't differ unless there is severe imbalance in the variances of the two groups.

The reason the two results differ after matching is that you failed to include the matching weights in the balance statistics for tableone. Because you did 4:1 matching with a caliper, not all treated units received 4 matches. Some received 3, some 2, some 1, and some none at all. In this case, matched control units receive different weights depending on how many other control units were matched to their treated unit. For example, if a treated unit only received one matched control unit (because all others were outside the caliper or had already been matched), that control unit would receive a weight of 1, but if a treated unit received four matched control units, each matched control unit would receive a weight of 1/4. The weights are necessary for assessing balance and for use in estimating the treatment effect. cobalt automatically extracts the weights from the matchit object and includes them in computing the SMD; tableone does not unless you supply the weights manually using svyCreateTableOne(). Even if you use svyCreateTableOne(), the SMDs will not be calculated correctly because they will use the weighted variance in the calculations, which is inappropriate. See my answer here for more detail about that.

You should use cobalt for assessing balance. tableone is great for making nice tables, but there has not been as much care put into making sure balance statistics are computed correctly and consistently for a variety of circumstances because that is not what the package was designed for, whereas cobalt was designed specifically for assessing balance after using MatchIt and other packages.

Citations

Anselin, Luc. 1988. Spatial Econometrics: Methods and Models. Kluwer. Netherlands.
Land Kenneth & Glenn Deane. 1992. On the large-sample estimation of regression models with spatial- or network-effects terms: A two-state least squares approach. Sociological Methodology 22: 221-248.

Standardized Mean Difference – Discrepancy Between Standardized Mean Difference in Cobalt and SMD Packages

Author of cobalt here. cobalt, by default when the estimand is the ATT, uses the standard deviation of the variable in the treated group in the denominator of the SMD. It is unclear how smd calculates the denominator of the SMD. The documentation is vague, and attempting to replicate the results from the function using the formula in the documentation fails at recovering the expected result. I have no idea what specific formula that package is using, and I can't figure it out despite my best efforts.

The formulas cobalt uses are transparent: if you compute the SMDs yourself, you will get the exact answer bal.tab() reports. For example, for age, we have

> (25.8162 - 28.0303) / 7.1550
[1] -0.3094479

The smd documentation claims to use the formula $d = \frac{\bar x_1 - \bar x_0}{\sqrt{\frac{s_1^2 + s_0^2}{2}}}$. Calculate that yourself using the values in the table:

> (25.8162 - 28.0303) / sqrt((7.1550^2 + 10.7867^2)/2)
[1] -0.2419045

It's still not what smd() reports. If you set s.d.denom = "pooled" in bal.tab(), you will find the expected SMD is computed as we computed it manually above (with some difference in the 6th decimal value due to rounding).

You can arbitrarily flip the signs of the SMD; often people report the absolute SMD so the sign isn't an issue. If you want to know which group has a higher mean, use the means themselves instead of trying to interpret the sign of the SMD.

Don't use the smd package for balance assessment, before or after weighting, unless you can accurately explain what it's doing (I can't). Just use cobalt. It was specifically designed for balance assessment and uses the best practices in the propensity score analysis literature. A lot of thought went into every decision and the defaults reflect those considerations. The documentation and formulas are transparent and the results are what you would expect if you calculated the quantities by hand using best practices.

Following up on the comments:

The formula in cobalt for SMDs for the ATE is the same as the formula in the smd documentation, which I posted above in my answer. Again, cobalt actually uses this formula; compare the result of bal.tab() when using s.d.denom = "pooled" to the result of the hand calculation I did above. I can't say what formula smd actually uses.

For categorical variables, cobalt splits them into dummies and then uses the same formula with a slight modification, which is that the variances are computed as $s_a^2=\bar x_a(1-\bar x_a)$. This is exactly the formula recommended by Austin (2009).

But please note as explained in the documentation that cobalt reports the unstandardized mean differences for binary variables by default. To request SMDs, set binary = "std" in the call to bal.tab(). See also this answer, in which I also discuss the differences between smd and cobalt. smd uses a particular formula for computing a single "SMD" for categorical variables, which cobalt doesn't do (cobalt computes a balance statistic for each category, not the variable as a whole). I explain why I don't like the statistic smd calculates in this answer.

Please read the cobalt documentation closely, as all of this is explained. Every choice and my motivation for it is explained either in the main vignette or on the documentation page for the function of interest.

Here is the result we get after weighting (a simplified version of the table you requested:

> bal.tab(W.out, disp = c("means", "sds"), un = TRUE,
          binary = "std", s.d.denom = "pooled")
Balance Measures
                Type    M.0.Un   SD.0.Un    M.1.Un   SD.1.Un Diff.Un   M.0.Adj  SD.0.Adj   M.1.Adj  SD.1.Adj Diff.Adj
prop.score  Distance    0.1822    0.2295    0.5774    0.2203  1.7569    0.5820    0.2168    0.5774    0.2203  -0.0201
age          Contin.   28.0303   10.7867   25.8162    7.1550 -0.2419   24.9658   10.5754   25.8162    7.1550   0.0929
educ         Contin.   10.2354    2.8552   10.3459    2.0107  0.0448   10.4031    2.4681   10.3459    2.0107  -0.0231
race_black    Binary    0.2028    0.4021    0.8432    0.3636  1.6708    0.8455    0.3614    0.8432    0.3636  -0.0058
race_hispan   Binary    0.1422    0.3492    0.0595    0.2365 -0.2774    0.0593    0.2362    0.0595    0.2365   0.0006
race_white    Binary    0.6550    0.4754    0.0973    0.2964 -1.4080    0.0952    0.2935    0.0973    0.2964   0.0052
married       Binary    0.5128    0.4998    0.1892    0.3917 -0.7208    0.1706    0.3761    0.1892    0.3917   0.0414
nodegree      Binary    0.5967    0.4906    0.7081    0.4546  0.2355    0.6897    0.4626    0.7081    0.4546   0.0390
re74         Contin. 5619.2365 6788.7508 2095.5737 4886.6204 -0.5958 2106.0448 4252.2469 2095.5737 4886.6204  -0.0018
re75         Contin. 2466.4844 3291.9962 1532.0553 3219.2509 -0.2870 1496.5412 2726.7838 1532.0553 3219.2509   0.0109

Effective sample sizes
           Control Treated
Unadjusted  429.       185
Adjusted     99.82     185

Note that I requested standardized mean differences for binary variables. Let's look at married to see how the SMD is calculated. Austin's formula is $$ d = \frac{\bar x_1 - \bar x_0}{\sqrt{\frac{s^2_1+s^2_0}{2}}} $$ where, for a binary variable, $s^2_a=\bar x_a(1-\bar x_a)$. Note that this is requested when we set s.d.denom = "pooled", which is not the default for the ATT. For the ATT, we use $\sqrt{s^2_1}$ in the denominator, which can be requested manually by setting s.d.denom = "treated".

Looking at the unweighted statistics, we can calculate this by hand.

> x <- lalonde$married
> t <- lalonde$treat
> 
> m_1 <- mean(x[t==1])
> m_0 <- mean(x[t==0])
> 
> s2_1 <- m_1 * (1 - m_1)
> s2_0 <- m_0 * (1 - m_0)
> 
> (m_1 - m_0) / sqrt((s2_1 + s2_0) / 2)
[1] -0.7207554

This is exactly the SMD for married in the unweighted sample. We can calculate the SMD for married in the weighted sample using the estimated weights. Remember that the denominator doesn't change; we always use the unweighted denominator. So all we need to do is replace the means with the weighted means.

> w <- W.out$weights
> m_1 <- weighted.mean(x[t==1], w[t==1])
> m_0 <- weighted.mean(x[t==0], w[t==0])
> 
> (m_1 - m_0) / sqrt((s2_1 + s2_0) / 2)
[1] 0.04144392

That is exactly the value reported under Diff.Adj for married.

If you are seeing discrepancies, maybe you aren't setting the values correctly. The default for binary variables is the unstandardized difference in means; to get the standardized difference, we need to set binary = "std". The default formula in the denominator of the SMD when the estimand is the ATT is $\sqrt{s^2_1}$; to use the pooled standard deviation, which is what Austin uses, you need to set s.d.denom = "pooled". Also remember that we always use the unweighted denominator in the SMD.

All of this is explained in the documentation for bal.tab(). The documentation for using bal.tab() with weightit objects specifically explains how s.d.denom is set by default. The documentation for col_w_smd() (which is the underlying function that calculates the SMD) explains what each s.d.denom means.

If you're still confused about how a specific number is computed, let me know and I'll explain.

Best Answer

Related Solutions

R Spatial Analysis – Understanding Discrepancy Between lagged lm() and lagsarlm()

Citations

Standardized Mean Difference – Discrepancy Between Standardized Mean Difference in Cobalt and SMD Packages

Related Question