Minimiser of risk for linear-exponential error loss

conditional-expectationoptimizationrobust-statisticsstatistical-inferencestatistics

Question: Solve the following optimisation problem:

$$\arg\min_{f} \mathbb{E} \left( \exp (-(Y- f(X))) + (Y – f(X)) – 1 \right)$$


Context: The linear-exponential loss function (LINEX loss for short) is given by $$L(\theta, \hat{\theta}) = \exp (-(\theta – \hat{\theta})) + (\theta – \hat{\theta}) – 1$$
The intuition behind this loss is that it is an asymmetric approximation to the usual quadratic loss function. This is a popular loss function in econometrics.

Given two random $X$ and $Y$, we may compute the loss of $Y$ relative to a measurable function of $X$, $f(X)$, simply by computing $L(Y, f(X))$. A central problem in statistical decision theory is computing the minimiser of the risk of this loss; namely, we wish to solve the following optimisation problem:
$$\arg\min_{f \in L^2} L(Y, f(X)) = \arg\min_{f \in L^2} \mathbb{E} \left( \exp (-(Y- f(X))) + (Y – f(X)) – 1 \right)$$

By considering some simpler cases (e.g. the case where $(X,Y)$ have a density), one may conjecture that the minimiser is $\hat{f}(X) = – \log \mathbb{E}(e^{-Y}|X)$. Indeed this paper derives the result in the setting of Bayes estimation.

How may one arrive at this result in this setting?


A similar problem:
A related problem is computing the minimiser for the risk of squared-error loss:
$$\arg\min_f \mathbb{E}[ (Y-f(X))^2 ] = \mathbb{E} (Y | X)$$

In this setting, one adds and subtracts $\mathbb{E}(Y|X)$, expands, then uses properties of conditional expectation to conclude that $\mathbb{E}(Y|X)$ is indeed the minimiser. Indeed:

$$\begin{align*}
\mathbb{E}[(Y – f(X))^2] &= \mathbb{E}[(Y – \mathbb{E}(Y|X) + \mathbb{E}(Y|X) – f(X))^2] \\
&= \mathbb{E}[(Y – \mathbb{E}(Y|X))^2] + \mathbb{E}[(\mathbb{E}(Y|X) – f(X))^2] + 2 \mathbb{E}[(Y – \mathbb{E}(Y|X))(\mathbb{E}(Y|X) – f(X))^2] \\
&= \mathbb{E}[(Y – \mathbb{E}(Y|X))^2] + \mathbb{E}[(\mathbb{E}(Y|X) – f(X))^2]
\end{align*}$$

where, in the last equality, we used the tower property with conditioning on $X$ to conclude the cross term is zero. At this point it is now evident that $f(X) = \mathbb{E}(Y|X)$ is a minimiser.

Perhaps this idea could be extended to the loss function given above?

Best Answer

First note that for any $a>0$, $z\in\mathbb R$, $$ a e^z -z \ge\log a +1. $$ Indeed, $$ a e^z -z = e^{z + \log a} - z \ge z+ \log a +1 - z = \log a +1. $$

Now, using the tower property of conditional expectation and the above inequality with $z = f(X)$, $a = \mathbb{E}[e^{-Y}\mid X]$, $$ \mathbb{E}\left[e^{f(X)-Y} - f(X) \right] = \mathbb{E}\left[e^{f(X)}\cdot \mathbb{E}[e^{-Y}\mid X] - f(X) \right]\\ \ge \mathbb{E} [\log \mathbb{E} [e^{-Y} \mid X] + 1], $$ whence $$ L(Y,f(X)) \ge \mathbb{E}\big[Y + \log \mathbb{E} [e^{-Y} \mid X]\big] = L\big(Y,\log \mathbb{E} [e^{-Y} \mid X]\big), $$ as required.

Related Question