a)
$\mathcal{R}(\theta, d) = \sum_{x=0}^{\infty} \mathcal{L}(\theta, a)f(\theta|x)$ - This is the formula to calculate Risk Function for discrete distributions.
Given, $\mathcal{L}(a-\theta)^2$ , we can substitute this in the formula
$ = \sum_{x=0}^{\infty} (x-\theta)^2 f(\theta|x)$
$= \sum_{x=0}^{\infty} (x^2+\theta^2-2x\theta) f(\theta|x)$
$= \sum_{x=0}^{\infty} x^2f(\theta|x) + \sum_{x=0}^{\infty} \theta^2f(\theta|x) - \sum_{x=0}^{\infty} 2x\theta(\theta|x) $
Actually if we look closely we can tell that $f(\theta|x)$ is Poisson Distribution. Therefore,
$= E[X^2] + \theta^2(1) - 2\theta E[X] $
$= \theta + \theta^2 + \theta^2 - 2\theta^2$
$= \theta$
$\mathcal{R}(\theta, d) = \theta$
b)
$r(\pi, d) = E[\mathcal{R}|\theta, d]$ - This is the formula to calculate mean risk
We calculated $\mathcal{R}(\theta, d) = \theta$, in section (a). Using that
$= \int_{0}^{\infty} \theta \pi(\theta) \, d\theta $
$= \int_{0}^{\infty} \frac{\lambda^{\alpha}\theta^{(\alpha +1)-1}\mathrm{e}^{-\lambda\theta}}{\Gamma(\alpha)} \, d\theta $
$= \frac{\alpha}{\lambda}\int_{0}^{\infty} \frac{\lambda^{\alpha + 1}\theta^{(\alpha +1)-1}\mathrm{e}^{-\lambda\theta}}{\Gamma(\alpha + 1)} \, d\theta $
$= \frac{\alpha}{\lambda} (1)$ -- (Because of Gamma Distribution - pdf)
$= \frac{\alpha}{\lambda}$
$r(\pi, d) = \frac{\alpha}{\lambda}$
In response to helpful comments by StubbornAtom and Dasherman, the following is a verbose solution transcribed from my write-up.
Computing the maximum likelihood estimator $\tilde{\theta}$ when the parameter space $\Theta = \{-1, 1\}$ is restricted.
The likelihood function is
$$L(\theta) = \prod^n_{i=1} f_{X_i}(x_i; \theta) = \frac{1}{(\sqrt{2 \pi})^n} \exp \left(-\frac{1}{2} \sum^n_{i=1} (x_i - \theta)^2 \right)$$
Therefore the log-likelihood function is
$$l(\theta) = -n \log \sqrt{2 \pi} - \frac{1}{2} \sum^n_{i=1} (x_i - \theta)^2$$
For the possible values of the 'true' parameter $\theta_0 = -1$ and $\theta_1 = 1$, and without loss of generality, maximisation of the log-likelihood $l(\theta)$ amounts to ascertaining conditions under which
$$l(\theta_0) > l(\theta_1)$$
Using the 'trick' that we can express the summation in $l(\theta)$ in the following way:
$$\sum^n_{i=1} (x_i - \theta)^2 = \sum^n_{i=1} [ (x_i - \overline{x}_n)^2 + (\overline{x}_n - \theta)^2 ] = n(\overline{x}_n - \theta)^2 + \sum^n_{i=1} (x_i - \overline{x}_n)^2$$
Our inequality $l(\theta_0) > l(\theta_1)$ can now be expressed as:
$$\begin{align}
-n \log \sqrt{2 \pi} + n (\overline{x}_n + 1)^2 - \frac{1}{2} \sum^n_{i=1} (x_i - \overline{x}_n)^2 > &\space -n \log \sqrt{2 \pi} + n (\overline{x}_n - 1)^2 \\
& \space - \frac{1}{2} \sum^n_{i=1} (x_i - \overline{x}_n)^2
\end{align}$$
Simplifying this we have that:
$$n(\overline{x}_n + 1)^2 < n(\overline{x}_n - 1)^2$$
And solving this gives
$$\overline{x}_n < 0$$
We can make sense of this as saying that the likelihood is maximised by setting $\theta = \theta_0 = -1$ when the sample mean $\overline{X}_n < 0$. And similarly, when the sample mean $\overline{X}_n > 0$, then we can maximise the likelihood by setting $\theta = \theta_1 = 1$. Hence the maximum likelihood estimator is
$$
\tilde{\theta}(X_1, \dots, X_n) =
\begin{cases}
1 \quad &\text{if} \quad \overline{X}_n > 0 \\
-1 \quad &\text{if} \quad \overline{X}_n < 0 \\
\end{cases}
$$
Intuitively, this means that if we use the maximum likelihood principle as a criterion for selecting an estimator $\tilde{\theta}$, then if the central tendency of the data, as estimated by the sample mean $\overline{X}_n$ lies in the interval $(0, \infty)$, then it is more plausible that the 'true' parameter that generated under the distribution $N(\theta, 1)$ is $\theta = 1$. And a similar argument holds that $\theta = -1$ for when $\overline{X}_n$ is in the interval $(-\infty, 0)$.
That the above maximum likelihood estimator is undefined for when $\overline{X}_n = 0$, means that it is not possible to say according to the maximum likelihood principle whether it is $\theta = 1$ or $\theta = -1$ that generated the data.
Computing the risk function $R(\theta, \tilde{\theta})$ of the maximum likelihood estimator $\tilde{\theta}$.
Using the notation that $x^n$ stands for the observed data $X_1 = x_1, \dots, X_n = x_n$, the risk function $R(\theta, \hat{\theta})$ of an arbitrary estimator $\hat{\theta}$ is the expectation of the loss $L(\theta, \hat{\theta})$ with respect to the distribution $p(x^n ; \theta)$ that generated the data:
$$R(\theta, \hat{\theta}) = \mathbb{E}_{p(x^n; \theta)}[L(\theta, \hat{\theta})] = \int L(\theta, \hat{\theta}(x^n)) \cdot p(x^n ; \theta) \space d x^n$$
Which is a function(al) of the parameter $\theta$ and the estimator $\hat{\theta}$.
We now rewrite the piecewise zero-one loss function in terms of indicators:
$$
L(\theta, \hat{\theta}) =
\begin{cases}
1 \quad &\text{if} \quad \theta \neq \hat{\theta} \\
0 \quad &\text{if} \quad \theta = \hat{\theta} \\
\end{cases}
= 1 - \mathbb{I}(\hat{\theta} = \theta)
$$
Meaning that the risk function is:
$$R(\theta, \hat{\theta}) = \int [1 - \mathbb{I}(\hat{\theta}(x^n) = \theta)] \cdot p(x^n ; \theta) \space dx^n = 1 - \int \mathbb{I}(\hat{\theta}(x^n) = \theta)] \cdot p(x^n ; \theta) \space dx^n$$
If we suppress the dependence of the maximum likelihood estimator $\tilde{\theta}(X_1, \dots, X_n)$ on the data, and instead view it as some random variable $Y = \tilde{\theta}(X_1, \dots, X_n)$, then it is a discrete random variable similar to a Bernoulli random variable, except that $Y$ has realisations $1$ and $-1$. The probability mass function of $Y$ is now
$$
f_{Y}(y) =
\begin{cases}
1 - p \quad &\text{if} \quad y = 1 \\
p \quad &\text{if} \quad y = -1 \\
\end{cases}
$$
where the parameter $p = P(\overline{X}_n < 0)$ and $1 - p = P(\overline{X}_n > 0)$, which needs to be evaluated.
In order to compute the parameter $p$, for normally distributed $X_1, \dots, X_n \sim N(\mu, \sigma^2)$, we have that the sample mean $\overline{X}_n$ will also be normally distributed, $\overline{X}_n \sim N(\mu, \sigma^2 / n)$. Because $X_1, \dots, X_n \sim N(\theta, 1)$, we can apply standard calculations to get
$$p = P(\overline{X}_n < 0) = P \left(Z < \frac{-\theta}{\sqrt{1 / n}}\right) = P(Z < -\theta \sqrt{n}) = \Phi(-\theta \sqrt{n})$$
where $\Phi(\cdot)$ is the standard normal cumulative distribution function. The probability mass function of $Y$ is therefore
$$
f_Y(y) =
\begin{cases}
1 - \Phi(-\theta \sqrt{n}) \quad &\text{if} \quad y = 1 \\
\Phi(-\theta \sqrt{n}) \quad &\text{if} \quad y = -1 \\
\end{cases}
$$
Using the fact that for some random variable $X$ and set $A$, $\mathbb{E}[\mathbb{I}(X \in A)] = P(X \in A)$, the risk function evaluated using the maximum likelihood estimator $\tilde{\theta}$ in terms of the above probability mass function is:
$$R(\theta, \tilde{\theta}) = 1 - \mathbb{E}_{p(\tilde{\theta} ; \space p)}[\mathbb{I}(\tilde{\theta} = \theta)] = 1 - P(\tilde{\theta} = \theta) = 1 - f_{Y}(\theta)$$
Therefore the risk function for the maximum likelihood estimator is:
$$
R(\theta, \tilde{\theta}) =
\begin{cases}
\Phi(- \sqrt{n}) \quad &\text{if} \quad \theta = 1 \\
1 - \Phi(\sqrt{n}) \quad &\text{if} \quad \theta = -1 \\
\end{cases}
$$
Best Answer
First note that for any $a>0$, $z\in\mathbb R$, $$ a e^z -z \ge\log a +1. $$ Indeed, $$ a e^z -z = e^{z + \log a} - z \ge z+ \log a +1 - z = \log a +1. $$
Now, using the tower property of conditional expectation and the above inequality with $z = f(X)$, $a = \mathbb{E}[e^{-Y}\mid X]$, $$ \mathbb{E}\left[e^{f(X)-Y} - f(X) \right] = \mathbb{E}\left[e^{f(X)}\cdot \mathbb{E}[e^{-Y}\mid X] - f(X) \right]\\ \ge \mathbb{E} [\log \mathbb{E} [e^{-Y} \mid X] + 1], $$ whence $$ L(Y,f(X)) \ge \mathbb{E}\big[Y + \log \mathbb{E} [e^{-Y} \mid X]\big] = L\big(Y,\log \mathbb{E} [e^{-Y} \mid X]\big), $$ as required.