Logistic Regression – Deriving the GMM Estimator for Covariate Balancing Propensity Score

causalityeconometricsgeneralized-momentslogisticpropensity-scores

The covariate balancing propensity score (CBPS) described by Imai and Ratkovic (2014) involves fitting a logistic regression for the propensity score $\pi_\beta(\mathbf{X}) = P(T = 1\vert\mathbf{X})$ using generalized method of moments (GMM), using covariate balancing moment conditions to augment the usual moment conditions for logistic regression. Because there are more (twice as many) moment conditions than parameters to be estimated, the system is over-identified.

Here is how Imai and Ratkovic define the method (here targeting the average treatment effect [ATE]):

The logistic regression score conditions for each coefficient indexed by $p$ are
$$s_{\beta_p}(T, \mathbf{X}) = (T – \pi_\beta(\mathbf{X})) X_p$$
And the balancing moment conditions are
$$w_{\beta_p}(T, \mathbf{X}) = \frac{T – \pi_\beta(\mathbf{X})}{\pi_\beta(\mathbf{X})(1 – \pi_\beta(\mathbf{X}))} X_p$$
When we stack these, we get the GMM moment conditions:
$$g_\mathbf{\beta}(T, \mathbf{X}) = \left(\array{s_{\beta_p}(T, \mathbf{X}) \\w_{\beta_p}(T, \mathbf{X})} \right)$$
with
$$\bar{g}_\mathbf{\beta}(T, \mathbf{X}) = \frac{1}{N}\sum_{i = 1}^N{g_\mathbf{\beta}(T_i, \mathbf{X}_i)}$$

They estimate $\hat{\beta}$ as
$$\hat{\beta} = \mathop{\arg \min}\limits_{\beta} \bar{g}_\mathbf{\beta}(T, \mathbf{X})^T \Sigma_\beta (T, \mathbf{X})^{-1} \bar{g}_\mathbf{\beta}(T, \mathbf{X})$$

They use
$$
\Sigma_\beta (T, \mathbf{X}) = \frac{1}{N}\sum_{i = 1}^N \left(\array{
\pi_\beta(\mathbf{X})(1-\pi_\beta(\mathbf{X}))X_iX_i^T & X_i X_i^T \\
X_i X_i^T & \frac{X_i X_i^T}{\pi_\beta(\mathbf{X})(1-\pi_\beta(\mathbf{X}))}
} \right)
$$

My question is how was this expression for $\Sigma_\beta (T, \mathbf{X})$ derived? They claim

We find that this covariance estimator outperforms the sample covariance of moment conditions because the latter does not penalize large weights.

This seems to differ from the usual efficient GMM weight matrix, which I thought would be
$$
\Sigma_\beta (T, \mathbf{X}) = \frac{1}{N}\sum_{i = 1}^Ng_\mathbf{\beta}(T_i, \mathbf{X}_i)g_\mathbf{\beta}(T_i, \mathbf{X}_i)^T
$$

but I have found these two are not equal to each other. Where, then, does this formula come from?

Update

I have found through experimentation that $$
\Sigma_\beta (T, \mathbf{X}) = -\frac{1}{N}\sum_{i = 1}^Ng_\mathbf{\beta}(1, \mathbf{X}_i)g_\mathbf{\beta}(0, \mathbf{X}_i)^T
$$

which is a clue! Also, following up on @Pusto's comment, the authors state the derived expression is found by "integrat[ing] out the treatment variable $T_i$ conditional on the pretreatment covariates $X_i$". It remains unclear to me how this integrating is supposed to work, but I think it may be possible to derive their estimator using this description.

Best Answer

Let $$\begin{align*}\Sigma(\boldsymbol T,\boldsymbol X) &= N^{-1}\sum_{i=1}^N\begin{pmatrix} s_{\beta}^{~}(T_i, X_i)s_{\beta}(T_i,X_i)^\top & s_{\beta}^{~}(T_i, X_i)w_{\beta}(T_i,X_i)^\top \\ s_{\beta}(T_i, X_i)^{~}w_{\beta}(T_i,X_i)^\top & w_{\beta}(T_i,X_i)w_{\beta}(T_i,X_i)^\top\end{pmatrix} \\ &= N^{-1}\sum_{i=1}^N\begin{pmatrix} \big(T_i - \pi_\beta(X_i)\big)^2X_i^{~}X_i^\top & \big(T_i-\pi_\beta(X_i)\big)^2\Big(\pi_\beta(X_i)\big(1 - \pi_\beta(X_i)\big)\Big)^{-1}X_i^{~}X_i^\top \\ \big(T_i-\pi_\beta(X_i)\big)^2\Big(\pi_\beta(X_i)\big(1 - \pi_\beta(X_i)\big)\Big)^{-1}X_i^{~}X_i^\top & \big(T_i - \pi_\beta(X_i)\big)^2\big(1 - \pi_\beta(X_i)\big)\Big)^{-2}X_i^{~}X_i^\top \end{pmatrix}\end{align*}$$ and assume for a moment that we are in a fixed factor design (i.e., there are $p$ fixed factors, which means that $\boldsymbol X$ is not a random variable but only $\boldsymbol T$. By considering the conditional expectation, all what follows can be restated for non-fixed factor designs (in that case, $\operatorname E$ should be understood as the conditional expectation operator, i.e, $\operatorname E[\cdot] = \mathbb E[\,\cdot\mid\boldsymbol X]$).

Now apply $\operatorname E$ to $\Sigma(\boldsymbol T,\boldsymbol X)$:

$$\begin{align*}\operatorname E[\Sigma(\boldsymbol T,\boldsymbol X)] &= N^{-1}\sum_{i=1}^N\begin{pmatrix} \operatorname E\left[\big(T_i - \pi_\beta(X_i)\big)^2\right]X_i^{~}X_i^\top & \operatorname E\left[\big(T_i-\pi_\beta(X_i)\big)^2\Big(\pi_\beta(X_i)\big(1 - \pi_\beta(X_i)\big)\Big)^{-1}\right]X_i^{~}X_i^\top \\ \operatorname E\left[\big(T_i-\pi_\beta(X_i)\big)^2\Big(\pi_\beta(X_i)\big(1 - \pi_\beta(X_i)\big)\Big)^{-1}\right]X_i^{~}X_i^\top & \operatorname E\left[\big(T_i - \pi_\beta(X_i)\big)^2\big(1 - \pi_\beta(X_i)\big)\Big)^{-2}\right]X_i^{~}X_i^\top \end{pmatrix} \\ &= N^{-1}\sum_{i=1}^N\begin{pmatrix} \pi_\beta(X_i)\big(1 - \pi_\beta(X_i)\big)X_i^{~}X_i^\top & X_i^{~}X_i^\top \\ X_i^{~}X_i^\top & \big(1 - \pi_\beta(X_i)\big)^{-1}X_i^{~}X_i^\top \end{pmatrix}\end{align*}$$ as $\operatorname E\left[\big(T_i - \pi_\beta(X_i)\big)^2\right] = \pi_\beta(X_i)\big(1-\pi_\beta(X_i)\big)$.

See Eq. 15 (and Eq. 16) in the paper.

You may ask now why would you consider the expected value of your estimator as an estimator. Well, there is a (numerical) motivation for this approach.

PS: The alignment looks weird, but should be fine. Maybe this is a Stats Exchange bug.