Risk function of an estimator under zero-one loss / evaluating an integral with an indicator.

integrationparameter estimationstatistical-inferencestatistics

I am working through some questions concerning the risk function of maximum likelihood estimators under zero-one loss, and am struggling with the evaluation of what seems like a simple integral.

Problem.

Let $X_1, …, X_n \sim N(\theta, 1)$. Suppose that $\theta \in \{-1, 1\}$.

Find the risk function of the maximum likelihood estimator $\tilde{\theta}$ under zero-one loss:

$$L(\theta, \widehat{\theta}) =
\begin{cases}
1 &\text{if} \space \theta \neq \widehat{\theta} \\
0 &\text{if} \space \theta = \widehat{\theta} \\
\end{cases}$$

My attempt.

The maximum likelihood estimator $\tilde{\theta}$ here is just the sample mean $\overline{X}_n$. I used the fact that for a given estimator $\hat{\theta}$, the risk function $R(\cdot, \hat{\theta})$ is the expectation of the loss function with respect to the joint distribution of the data $X_1, …, X_n$. And I am aware that I should get a function of the form $R: \Theta \rightarrow \mathbb{R}^+$, where in this case, $\Theta$ is the restricted parameter space $\{-1, 1\}$. And so I have:

$$\begin{align}
R(\theta, \tilde{\theta}) &= \mathbb{E}_{\theta}[L(\theta, \tilde{\theta})] \\
&= \int … \int L(\theta, \widehat{\theta}) f_{X_1, …, X_n}(x_1, …, x_n; \theta) dx_1 … dx_n \\
&= \int … \int \mathbb{I}(\theta \neq \bar{X}_n) (2\pi)^{-n/2}\exp \left(\frac{1}{2}\sum^n_{i=1} (x_i – \theta)^2 \right) dx_1 … dx_n
\end{align}$$

However, I am having difficulty evaluating this integral. I would appreciate some assistance on this.

Related questions on stackexchange.

The most similar question to mine I could find on any of the stackexchange forums was one question, however, in that case, it is for discrete rather than continuous rvs, so it's more obvious how to compute the risk function.

Best Answer

In response to helpful comments by StubbornAtom and Dasherman, the following is a verbose solution transcribed from my write-up.

Computing the maximum likelihood estimator $\tilde{\theta}$ when the parameter space $\Theta = \{-1, 1\}$ is restricted.

The likelihood function is

$$L(\theta) = \prod^n_{i=1} f_{X_i}(x_i; \theta) = \frac{1}{(\sqrt{2 \pi})^n} \exp \left(-\frac{1}{2} \sum^n_{i=1} (x_i - \theta)^2 \right)$$

Therefore the log-likelihood function is

$$l(\theta) = -n \log \sqrt{2 \pi} - \frac{1}{2} \sum^n_{i=1} (x_i - \theta)^2$$

For the possible values of the 'true' parameter $\theta_0 = -1$ and $\theta_1 = 1$, and without loss of generality, maximisation of the log-likelihood $l(\theta)$ amounts to ascertaining conditions under which

$$l(\theta_0) > l(\theta_1)$$

Using the 'trick' that we can express the summation in $l(\theta)$ in the following way:

$$\sum^n_{i=1} (x_i - \theta)^2 = \sum^n_{i=1} [ (x_i - \overline{x}_n)^2 + (\overline{x}_n - \theta)^2 ] = n(\overline{x}_n - \theta)^2 + \sum^n_{i=1} (x_i - \overline{x}_n)^2$$

Our inequality $l(\theta_0) > l(\theta_1)$ can now be expressed as:

$$\begin{align} -n \log \sqrt{2 \pi} + n (\overline{x}_n + 1)^2 - \frac{1}{2} \sum^n_{i=1} (x_i - \overline{x}_n)^2 > &\space -n \log \sqrt{2 \pi} + n (\overline{x}_n - 1)^2 \\ & \space - \frac{1}{2} \sum^n_{i=1} (x_i - \overline{x}_n)^2 \end{align}$$

Simplifying this we have that:

$$n(\overline{x}_n + 1)^2 < n(\overline{x}_n - 1)^2$$

And solving this gives

$$\overline{x}_n < 0$$

We can make sense of this as saying that the likelihood is maximised by setting $\theta = \theta_0 = -1$ when the sample mean $\overline{X}_n < 0$. And similarly, when the sample mean $\overline{X}_n > 0$, then we can maximise the likelihood by setting $\theta = \theta_1 = 1$. Hence the maximum likelihood estimator is

$$ \tilde{\theta}(X_1, \dots, X_n) = \begin{cases} 1 \quad &\text{if} \quad \overline{X}_n > 0 \\ -1 \quad &\text{if} \quad \overline{X}_n < 0 \\ \end{cases} $$

Intuitively, this means that if we use the maximum likelihood principle as a criterion for selecting an estimator $\tilde{\theta}$, then if the central tendency of the data, as estimated by the sample mean $\overline{X}_n$ lies in the interval $(0, \infty)$, then it is more plausible that the 'true' parameter that generated under the distribution $N(\theta, 1)$ is $\theta = 1$. And a similar argument holds that $\theta = -1$ for when $\overline{X}_n$ is in the interval $(-\infty, 0)$.

That the above maximum likelihood estimator is undefined for when $\overline{X}_n = 0$, means that it is not possible to say according to the maximum likelihood principle whether it is $\theta = 1$ or $\theta = -1$ that generated the data.

Computing the risk function $R(\theta, \tilde{\theta})$ of the maximum likelihood estimator $\tilde{\theta}$.

Using the notation that $x^n$ stands for the observed data $X_1 = x_1, \dots, X_n = x_n$, the risk function $R(\theta, \hat{\theta})$ of an arbitrary estimator $\hat{\theta}$ is the expectation of the loss $L(\theta, \hat{\theta})$ with respect to the distribution $p(x^n ; \theta)$ that generated the data:

$$R(\theta, \hat{\theta}) = \mathbb{E}_{p(x^n; \theta)}[L(\theta, \hat{\theta})] = \int L(\theta, \hat{\theta}(x^n)) \cdot p(x^n ; \theta) \space d x^n$$

Which is a function(al) of the parameter $\theta$ and the estimator $\hat{\theta}$.

We now rewrite the piecewise zero-one loss function in terms of indicators:

$$ L(\theta, \hat{\theta}) = \begin{cases} 1 \quad &\text{if} \quad \theta \neq \hat{\theta} \\ 0 \quad &\text{if} \quad \theta = \hat{\theta} \\ \end{cases} = 1 - \mathbb{I}(\hat{\theta} = \theta) $$

Meaning that the risk function is:

$$R(\theta, \hat{\theta}) = \int [1 - \mathbb{I}(\hat{\theta}(x^n) = \theta)] \cdot p(x^n ; \theta) \space dx^n = 1 - \int \mathbb{I}(\hat{\theta}(x^n) = \theta)] \cdot p(x^n ; \theta) \space dx^n$$

If we suppress the dependence of the maximum likelihood estimator $\tilde{\theta}(X_1, \dots, X_n)$ on the data, and instead view it as some random variable $Y = \tilde{\theta}(X_1, \dots, X_n)$, then it is a discrete random variable similar to a Bernoulli random variable, except that $Y$ has realisations $1$ and $-1$. The probability mass function of $Y$ is now

$$ f_{Y}(y) = \begin{cases} 1 - p \quad &\text{if} \quad y = 1 \\ p \quad &\text{if} \quad y = -1 \\ \end{cases} $$

where the parameter $p = P(\overline{X}_n < 0)$ and $1 - p = P(\overline{X}_n > 0)$, which needs to be evaluated.

In order to compute the parameter $p$, for normally distributed $X_1, \dots, X_n \sim N(\mu, \sigma^2)$, we have that the sample mean $\overline{X}_n$ will also be normally distributed, $\overline{X}_n \sim N(\mu, \sigma^2 / n)$. Because $X_1, \dots, X_n \sim N(\theta, 1)$, we can apply standard calculations to get

$$p = P(\overline{X}_n < 0) = P \left(Z < \frac{-\theta}{\sqrt{1 / n}}\right) = P(Z < -\theta \sqrt{n}) = \Phi(-\theta \sqrt{n})$$

where $\Phi(\cdot)$ is the standard normal cumulative distribution function. The probability mass function of $Y$ is therefore

$$ f_Y(y) = \begin{cases} 1 - \Phi(-\theta \sqrt{n}) \quad &\text{if} \quad y = 1 \\ \Phi(-\theta \sqrt{n}) \quad &\text{if} \quad y = -1 \\ \end{cases} $$

Using the fact that for some random variable $X$ and set $A$, $\mathbb{E}[\mathbb{I}(X \in A)] = P(X \in A)$, the risk function evaluated using the maximum likelihood estimator $\tilde{\theta}$ in terms of the above probability mass function is:

$$R(\theta, \tilde{\theta}) = 1 - \mathbb{E}_{p(\tilde{\theta} ; \space p)}[\mathbb{I}(\tilde{\theta} = \theta)] = 1 - P(\tilde{\theta} = \theta) = 1 - f_{Y}(\theta)$$

Therefore the risk function for the maximum likelihood estimator is:

$$ R(\theta, \tilde{\theta}) = \begin{cases} \Phi(- \sqrt{n}) \quad &\text{if} \quad \theta = 1 \\ 1 - \Phi(\sqrt{n}) \quad &\text{if} \quad \theta = -1 \\ \end{cases} $$

Related Question