Performing the PS estimation separately within each of the groups defined by the exact matching variables is equivalent to estimating a propensity score in the whole sample with the exact matching variables and their interactions with all other covariates included in the PS model and then splitting your data based on the exact matching variables. Remember that the goal of PS matching is covariate balance; use whichever method yields good covariate balance without requiring you to discard too many treated units.
As I mentioned in this post, omitting an important covariate in a propensity score model, regardless of whether you exactly match on that covariate, can yield poorly performing propensity scores that are far from the true propensity scores (because they are estimated with an incorrect, underspecified model). All though with exact matching you'll achieve perfect balance on the exact matching variables, poorly estimated propensity scores may fail to balance your other covariates. But this is an empirical question that depends on your dataset at hand; you should try various ways of estimating the propensity score and matching and pick the one that yields the best balance and remaining sample size.
This is explained in Stuart (2008) and in the cobalt
vignette. The problem is that when comparing balance before and after matching, the SMD will be affected not only by changes in balance but also by changes in the standard deviation of the covariate when the standard deviation of the matched sample is used as the standardization factor after matching. This muddles two things together when we only care about one. Holding the standard deviation constant prevents this, isolating the effect of matching on balance alone.
Consider the following example. Let's say the mean of a covariate X
(e.g., age) in the treated group is 44 and the mean in the control group is 46, and the pooled standard deviation is 9. Let's say that after matching, the control group mean is now 45 and the pooled standard deviation is now 4. Was there better balance before matching or after matching?
It should be clear that the covariate means are closer together, which indicates an improvement in balance and therefore a reduction in bias. Which method of computing the SMD reflects this?
Prior to matching, the SMD is (46-44)/9 = .22
. By the standard criterion of SMDs less than .1, this would be considered imbalanced.
Using the formula for the SMD that uses the standard deviation in the unmatched sample, the matched SMD is (45-44)/9 = .11
, indicating better balance.
Using the formula for the SMD that uses the standard deviation in the matched sample, the matched SMD is (45-44)/4 = .25
, indicating that balance got worse after matching!
Remember that the bias of the effect estimate is a function of the mean differences, and standardizing them to produce the SMD is just a way to simplify balance statistics for users. It's all arbitrary anyway, but at least using the unmatched standard deviation correctly isolates changes in balance from changes in variability, the latter of which is not related to bias.
Best Answer
The positivty assumption required for identifying the ATT is indeed less strict than the positivity assumption required for the ATE. Stated in "overlap" terms, positivity for the ATT requires that the distribution of covariates for the treated group is contained within the distribution of covariates for the control group. That is, there is no place in the covariate distribution where there are only treated units but no control units. However, there can be areas of the covariate distribution in the controls that are not occupied by the treated. This is not true of the positivity requirement for the ATE. Intuitively, you can see that such control units would simply be dropped in a matching analysis or down-weighted in a weighting analysis; such units would have a propensity score of zero.
More formally, the positivity requirement for the ATT stated in the usual "probability of treatment" terms is $$ P(A=1|X=x) < 1 $$ for all $x$. That is, there are no areas of the covariate distribution where only the treatment is possible; in all areas of the covariate distribution, it is possible to receive control. This is general a much easier criterion to satisfy than the positivity requirement for the ATE.