Solved – Adjusting Sample with Propensity Score Weighting and ATT

observational-studypropensity-scorespythonscipystatsmodels

I have a retrospective sample that contains a treatment and non-treatment group with >10 covariates comprised of both categorical and continuous variables. I used the chi-squared and Mann-Whitney U tests (most of the covariates were non-parametric) to compare the treatment and non-treatment groups. There were significant differences between the groups with some of the covariates, so I was employing propensity scores (PS) to estimate the average treatment effect for treated (ATT) to then weight the non-treated samples to form more similar groups.

I’m relatively new to PS adjustments, but I’m able to calculate the PS using logistic regression and then estimate the ATT using

$$ATT_i = T_i + (1 – T_i) * \frac{PS_i}{(1 – PSi)}$$
where i corresponds to one observation.
My question is what is the proper way to weight the samples to reperform the group analysis to show they are more similar?

For the categorical variables this seemed straightforward as multiplying the weights by the value for the category (0 or 1) and then the group size would be the sum of all weights. Is this the correct way of going about this weighting?

I’m not entirely sure how to adjust the continuous variables with the weights and then recalculate the difference score. To my knowledge, there’s no weighted Mann-Whitney U test to accomplish this, and simple multiplication by weights (or normalized weights) doesn’t seem correct. What is the proper way to reperform this analysis?

Apologies if this question is basic or if I am incorrect in my assumptions so far.

For the analysis I'm using python with scipy, numpy, and statsmodels (with Jupyter).

Thanks for your help!

Best Answer

First, many methodologists agree that hypothesis testing is inappropriate for balance testing. That is, you should not be using chi-square and MWU tests to determine whether your groups differ in their covariate distributions. The argument is the balance is a property of your sample, not of a hypothetical population. With this in mind, computing sample statistics like standardized mean differences, differences in proportion, variance ratios, and KS statistics are more appropriate. This is what I would recommend you do before and after weighting. All of these statistics have weighted versions that can be applied to your weighted data.

The weighted mean is the sum of the product of the weight and the value for each unit divided by the sum of the weights. This can be used for standardized mean difference (by dividing by the unweighted standard deviation of the treatment group) and the difference in proportion. Weighted variance can be computed using the "reliability weights" formula. Typically you want standardized mean differences and differences in proportion less than 0.1 (i.e., close to 0) after weighting, and variance ratios close to 1 after weighting.