Statistical Inference – How to Show Neyman Orthogonality of the Score

estimationinferenceself-study

In section 4.1 of this paper, the authors talk about the PLR model:

\begin{aligned}
&Y=D \theta_{0}+g_{0}(X)+U, \quad E_{P}[U \mid X, D]=0 \\
&D=m_{0}(X)+V, \quad E_{P}[V \mid X]=0
\end{aligned}

where the parameter of interest is the regression coefficient $\theta_0$. In equation (4.4), the authors provide the score function

$$\psi(W ; \theta, \eta):=\{Y-\ell(X)-\theta(D-m(X))\}(D-m(X)), \quad \eta=(\ell, m)$$ where $W = (Y, D, X)$
and stated that "it's easy to see that $\theta_0$ satisfies the orthogonality condition $\partial_{\eta} E_{P} \psi\left(W ; \theta_{0}, \eta_{0}\right)[\eta-$ $\left.\eta_{0}\right]=0$, for $\eta_{0}=\left(\ell_{0}, m_{0}\right)$, where $\ell_{0}(X)=E_{P}[Y \mid X]$."

I have two questions:

  1. How did the authors arrive at the score function $\psi(W;\theta,\eta)$?
  2. How can one show that the orthogonality condition is satisfied?

Best Answer

The first question is somewhat difficult to answer, because it requires speculation about the author's psychology, but let me at least try to give some intuition that it is a sensible score. Recall that what it means for $\psi(W;\theta,\eta)$ to be a score is that at the true values of $\eta_0 = (\ell_0,m_0)$, we have the moment condition

$$E[\psi(W;\theta,\eta_0)] = 0$$

Let us now write out what that entails given the $\psi$ defined above. We end up with the equation

$$E[(Y - \ell_0(X) - \theta_0(D-m_0(X)))\cdot(D-m_0(X))] = 0$$ Using the linearity of expectations and rearranging, we can get a closed form expression for $\theta_0$ given the above equation: $$\theta_0 = \frac{E[(Y-\ell_0(X))(D-m_0(X))]}{E[(D-m_0(X))^2]} = \frac{E[\mathrm{Cov}(Y,D|X)]}{E[\mathrm{Var}(D|X)]}$$

I like to think about this expression for $\theta_0$ in terms of the Frisch-Waugh-Lovell (FWL) theorem. Recall that this theorem states that in the linear model $Y = \beta D + \gamma X + \varepsilon$, $\beta$ is numerically equivalent to the outcome of the model $r_Y = \beta r_D + \delta$ where $r_Y = Y - \mathrm{L}(Y|X_2)$ and $r_D = Y - \mathrm{L}(D|X_2)$ are respectively the residuals from predicting $Y$ and $D$ using the $X$'s (here $\mathrm{L}(A|B)$ is defined to be the best linear predictor of $A$ given $B$). Recall additionally that when $D$ is scalar, the OLS is just the covariance of the outcome and the predictor divided by the variance of the predictor, i.e. $\beta = \frac{\mathrm{Cov}(r_Y,r_D)}{\mathrm{Var}(r_D)}$. Note that the expression for $\theta$ can thus be thought of as the "nonparametric" analogue of the FWL theorem: rather than taking the residual from the best linear predictor, we take the residual from the best predictor, i.e. the conditional expectation function.

Now, addressing your second point, let $\delta_\ell(X),\delta_m(X)$ be two test functions respectively perturbing $\ell$ and $m$. Then Neyman orthogonality states that for choices of $\delta_\ell$ and $\delta_m$, we have

$$\frac{\mathrm d E[\psi(W;\theta,\eta_0 + r(\delta_\ell,\delta_m))]}{\mathrm d r} = 0$$ where the derivative is taken around the point where $r=0$. To prove this, we can simply expand out the definition of $\psi$ to obtain

$$\begin{aligned}E[\psi(W;\theta,\eta_0 + r(\delta_\ell,\delta_m))] &= E[(Y - \ell_0(X)-r \delta_\ell(X))(D - m_0(X) - r\delta_m(X))]\\ &- \theta E[(D-m_0(X)-r\delta_m(X))^2] \end{aligned}$$ Let us first check that the derivative of the first term is mean 0. To do so, we note that by differentiating under the expectation sign around $r=0$, we have $$\begin{aligned} \frac{\mathrm d}{\mathrm dr} E[(Y - \ell_0(X)-r \delta_\ell(X))(D - m_0(X) - r\delta_m(X))] &= E\left[\frac{\mathrm d}{\mathrm dr} (Y - \ell_0(X)-r \delta_\ell(X))(D - m_0(X) - r\delta_m(X))\right]\\ &= -E[(Y-\ell_0(X))\delta_m(X) + \delta_\ell(X)(D- m_0(X))]\\ &= -E\left[\underbrace{E[Y-\ell_0(X)|X]}_{=0} \delta_m(X)\right] - E\left[E[\delta_\ell(X)\underbrace{E[D-m_0(X)|X]}_{=0}\right] = 0 \end{aligned}$$ Note that the two terms above are mean 0 as a result of the fact that $\ell_0$ and $m_0$ are by definition conditional expectation functions.

In light of the above, all that remains to be checked is that the limit of the third term goes to 0 as $r\to 0$. Specifically, we must show $$\frac{\theta\mathrm dE[(D-m_0(X)-r\delta_m(X))^2]}{\mathrm dr} = 0$$ Now, differentiating under the integral sign again, we have $$\begin{aligned}\frac{\mathrm d \theta E[(D-m_0(X)-r\delta_m(X))^2]}{\mathrm dr} &= \theta E\left[\frac{\mathrm d}{\mathrm dr}D-m_0(X)-r\delta_m(X))^2\right] \\&= -2 \theta E[(D-m_0(X))\delta_m(X)]\\ &= \theta E[E[(D-m_0(X))\delta_m(X) | X]]\\ &= \theta E[\underbrace{E[(D-m_0(X))|X]}_{=0}\delta_m(X)]\\ &= \theta E[0|X] = 0\end{aligned}$$ So once again, after some manipulation, this third term equalling 0 is due to the definition of $m_0$ as the conditional expectation function of $D$.