Convergence in probability and asymptotic distribution of MLE for Uniform $(-\theta, \theta)$

maximum likelihoodparameter estimationprobability-limit-theoremsstatistical-inferencestatistics

Problem.

I am having difficulties with asymptotic results of the maximum likelihood estimator (MLE) of the parameter $\theta$ for the Uniform $(-\theta, \theta)$ distribution, given an IID sample $X_1, X_2,…, X_n$. Namely showing explicitly that:

$$\widehat{\theta}_n \overset{p}{\rightarrow} \theta$$

and finding the limiting distribution

$$n(\theta – \widehat{\theta}_n)$$

Where $\widehat{\theta}_n$ denotes the maximum-likelihood estimator.

I am aware that if certain regularity conditions are satisfied, the MLE is a consistent estimator of a parameter of a distribution; so this question concerns showing it for a particular form of the Uniform distribution. I would have liked this question to be more concise, but it is not so because the issue lies in the fact that I am making reasoning errors which I am unable to discern.

My attempt.

Convergence in probability

I first computed the likelihood function $L(\theta)$, and found that the MLE of $\theta$ can be expressed as $\widehat{\theta}_n = \max \{ -X_{(1)}, X_{(n)} \}$, where $X_{(1)}$ and $X_{(n)}$ are the 1st and nth order statistics. Now in order to show consistency, I began by considering the following, with a view to showing that it converges to 0 as $n \rightarrow \infty$:

$$P(|\widehat{\theta}_n – \theta | \geq \epsilon) = \underbrace{P(\widehat{\theta}_n \geq \theta + \epsilon)}_{=0} + P(\widehat{\theta}_n \leq \theta – \epsilon) = P(\widehat{\theta}_n \leq \theta – \epsilon)$$

I then proceeded by evaluating the RHS and this is where things start to become uncertain for me:

$$\begin{align}
P(\widehat{\theta}_n \leq \theta – \epsilon) &= P(\max \{ -X_{(1)}, X_{(n)} \} \leq \theta – \epsilon) \\
&= P \left( \left\{ -X_{(1)} \leq \theta – \epsilon \right \} \cap \left \{ X_{(n)} \leq \theta – \epsilon \right \} \right) \\
&= P(-X_{(1)} \leq \theta – \epsilon) P(X_{(n)} \leq \theta – \epsilon)
\end{align}$$

  1. In going from the 1st to the 2nd equality, I reasoned that in order for the maximum of $-X_{(1)}$ and $X_{(n)}$ to be less than $\theta – \epsilon$, both $-X_{(1)}$ and $X_{(n)}$ must be less that $\theta – \epsilon$. However, as it is a maximum of order statistics, the nesting is confusing me, and I'm starting to experience doubt as to whether this is a valid justification.

  2. In going from the 2nd to the 3rd equality, I am fairly certain that is appropriate as the $X_i$ are independent, and hence their order statistics are independent.

I computed the CDF of the Uniform $(-\theta, \theta)$ to get:

$$F_{X_i}(x_i) =
\begin{cases}
0 \quad &x_i \leq -\theta \\
\frac{x_i + \theta}{2\theta} \quad &-\theta \leq x_i \leq \theta \\
1 \quad &x_i \geq \theta \\
\end{cases}$$

Evaluating the probability on the nth order statistic:

$$P(X_{(n)} \leq \theta – \epsilon) = P \left(\bigcap^n_{i=1} X_i \leq \theta – \epsilon \right) = \prod^n_{i=1} P (X_i \leq \theta – \epsilon) = \left(1 – \frac{\epsilon}{2 \theta} \right)^n$$

Evaluating the probability on the 1st order statistic:

$$P(-X_{(1)} \leq \theta – \epsilon) = P(X_{(1)} \geq – \theta + \epsilon) = 1 – P \left(X_{(1)} \leq -\theta + \epsilon \right) = 1 – P \left( \bigcap^n_{i=1} X_i \leq – \theta + \epsilon \right) = \left( 1 – \frac{\epsilon}{2\theta} \right)^n $$

Setting aside my reservation about how the above two calculations are combined , the above two individual probability calculations seem to accord with my intuitions, in that they are the same due to symmetry arguments. Which yields:

$$P(|\widehat{\theta}_n – \theta | \geq \epsilon) = \left( 1 – \frac{\epsilon}{2\theta} \right)^{2n} = \left(\frac{2 \theta – \epsilon}{2\theta} \right)^{2n}$$

Which converges to 0 as $n \rightarrow \infty$ for all $\epsilon > 0$.

Convergence in distribution

From a previous example in my notes, I know that for IID $X_1, … ,X_n \sim \text{Uniform}(0,1)$, the limiting distribution of the nth order statistic is an $\text{Exponential}(1)$ distribution; that is:

$$X_{(n)} \overset{d}{\rightarrow} \text{Exponential}(1)$$

On this basis, I guessed that the solution would have a similar flavour.

Invoking the arguments I made previously, I found that:

$$\begin{align}
P(n(\theta – \widehat{\theta}_n) \leq t) &= 1 – P \left(\widehat{\theta}_n \leq \theta – \frac{t}{n} \right) \\
&= 1 – P\left( \left\{-X_{(1)} \leq \theta – \frac{t}{n} \right \} \cap \left \{ X_{(n)} \leq \theta -\frac{t}{n} \right \} \right) \\
&= 1 – \left( 1 – \frac{t}{2n \theta} \right)^{2n} \\
\end{align}$$

Now I am left with something that looks very similar to an exponential CDF, and as a guess I suspect it might be an $\text{Exponential}(1 / \theta)$; and that the appearance of the factor of 2 is erroneous (due to an issue in the previous arguments which I am unable to discern).

Another attempt.

After reading through some previous posts on here, I decided to try reformulating the MLE estimator to something equivalent, that is by setting $\widehat{\theta}_n = \max\{|X_i|\} \space \forall \space i = 1, … , n$. I found that this skirts around the issue of not being sure about point 1. Focusing on the convergence in distribution argument (as the convergence in probability argument is a stepping stone) I found that:

$$\begin{align}
P(n(\theta – \widehat{\theta}_n) \leq t) = 1 – P \left(\widehat{\theta}_n \leq \theta – \frac{t}{n} \right) &= 1 – P \left( \max \{ |X_i | \} \leq \theta – \frac{t}{n} \space \forall \space i = 1, … , n \right) \\
&= 1 – P \left( \bigcap^n_{i=1} |X_i| \leq \theta – \frac{t}{n} \right) \\
&= 1 – \prod^n_{i=1} \left( P(X_i \leq \theta – \frac{t}{n}) + P(-X_i \leq \theta – \frac{t}{n} ) \right) \\
&= 1 – \prod^n_{i=1} \left( P(X_i \leq \theta – \frac{t}{n}) + P(X_i \geq -\theta + \frac{t}{n}) \right) \\
&= 1 – \prod^n_{i=1} F_{X_i}\left( \theta – \frac{t}{n} \right) \left[1 – F_{X_i}\left(-\theta + \frac{t}{n} \right) \right] \\
&= 1 – \prod^n_{i=1} \left( \frac{2 \theta – t/n}{2 \theta} + 1 – \frac{t/n}{2 \theta} \right) \\
&= 1 – \left( 2 – \frac{t}{2n \theta} \right)^n
\end{align}$$

So I know that as I am not getting the same results, there are errors of reasoning being made. However, I am having trouble discerning where these lie.

I would appreciate if members of this community were to assist me.

Best Answer

In case anyone else struggles with these questions, the following is a verbose solution that I transcribed from my write-up, following helpful suggestions from StubbornAtom and Math Helper in the comments.

Showing $\hat{\theta}_n \overset{p}{\rightarrow} \theta$.

In order to explicitly show that the maximum likelihood estimator $\hat{\theta}_n = {\max}_i \{ \lvert X_i \rvert \}$ converges in probability to the 'true' parameter $\theta$ of the $\text{Uniform}(-\theta, \theta)$ distribution, i.e. that $\hat{\theta}_n$ is a consistent estimator, we opt for directly showing that as $n \rightarrow \infty$, then

$$P(\lvert \hat{\theta}_n - \theta \rvert > \epsilon) \longrightarrow 0$$

for all $\epsilon > 0$.

Notice that we can simplify the left hand side of the above to get

\begin{align*} P(\lvert \hat{\theta}_n - \theta \rvert > \epsilon) &= P \left(\{ \hat{\theta}_n - \theta > \epsilon\} \cup \{ -(\hat{\theta}_n - \theta) > \epsilon\}\right) \\ &= P(\hat{\theta}_n > \theta + \epsilon) + P(\hat{\theta}_n < \theta - \epsilon) \\ &= P(\hat{\theta}_n < \theta - \epsilon) \end{align*}

because

$$X_i \sim \text{Uniform}(-\theta, \theta) \implies \underset{i}{\max} \{ \lvert X_i \rvert \} \leq \theta \implies P(\hat{\theta}_n > \theta + \epsilon) = 0$$

To simplify this further, notice that

\begin{align*} P(\lvert \hat{\theta}_n - \theta \rvert > \epsilon) &= P(\hat{\theta}_n < \theta - \epsilon) \\ &= P(\underset{i}{\max} \{ \lvert X_i \rvert \} < \theta - \epsilon) \\ &= P \left( \bigcap^n_{i=1} \{ \lvert X_i \rvert < \theta - \epsilon \}\right) \\ &= \prod^n_{i=1} P(\lvert X_i \rvert < \theta - \epsilon) \\ \end{align*}

Using the symmetry of the $\text{Uniform}(-\theta, \theta)$ about $0$, the probability mass contained within the interval $[-(\theta - \epsilon), \theta - \epsilon]$ consists of a rectangle of height $1 / 2\theta$ and length $2(\theta - \epsilon)$, meaning that

\begin{align*} P(\lvert \hat{\theta}_n - \theta \rvert > \epsilon) &= \prod^n_{i=1} P(\lvert X_i \rvert < \theta - \epsilon) \\ &= \left(\frac{\theta - \epsilon}{\theta} \right)^n \longrightarrow 0 \\ \end{align*}

as $n \rightarrow \infty$ for all $0 < \epsilon < \theta$, which is a consequence of the fact that $$\frac{\theta - \epsilon}{\theta} < 1$$

In the case that $\epsilon \geq \theta$, then $ P(\lvert X_i \rvert < \theta - \epsilon) = 0$ because the absolute value function is, by definition, non-negative. Hence as $n \rightarrow \infty$, we have that $P(\lvert \hat{\theta}_n - \theta \rvert > \epsilon) \longrightarrow 0$ for all $\epsilon > 0$, and hence $\hat{\theta}_n \overset{p}{\rightarrow} \theta$.

Limiting distribution of $n(\theta - \hat{\theta}_n)$.

To find the limiting distribution of $n(\theta - \hat{\theta}_n)$, we begin by considering

\begin{align*} P(n(\theta - \hat{\theta}_n) \leq t) &= P \left( \hat{\theta}_n \geq \theta - \frac{t}{n} \right) \\ &= P \left( \underset{i}{\max} \{ \lvert X_i \rvert \} \geq \theta - \frac{t}{n} \right) \\ &= 1 - P \left( \underset{i}{\max} \{ \lvert X_i \rvert \} \leq \theta - \frac{t}{n} \right) \\ &= 1 - P \left( \bigcap^n_{i=1} \left\{ \lvert X_i \rvert \leq \theta - \frac{t}{n} \right\} \right) \\ &= 1 - \prod^n_{i=1} P \left( \lvert X_i \rvert \leq \theta - \frac{t}{n} \right) \\ \end{align*}

In order to evaluate the term within the product operator, we compute the CDF of a transformed $Y = r(X) = \lvert X \rvert$ where $X \sim \text{Uniform}(-\theta, \theta)$. To find the CDF, we need to find the set $A_y = \{x : \lvert x \rvert \leq y \}$ for all $y$, which will yield

\begin{align*} F_Y(y) &= P(Y \leq y ) \\ &= P( \{x : \lvert x \rvert \leq y \}) \\ &= \int_{A_y} f_X(x) dx \end{align*}

Noting that the transformed $y \in [0, \theta]$, the set $A_y$ constitutes a line segment of length $y$, so that $A_y = y$. Meaning that

\begin{align*} \int_{A_y} f_X(x) dx = \int_{A_y} \frac{1}{\theta} dx = \int^y_{0} \frac{1}{\theta} dx = \frac{y}{\theta} \end{align*}

The CDF of the transformed $Y = \lvert X \rvert$ is therefore

$$ F_Y(y) = \begin{cases} 0 \quad & y < 0\\ \frac{y}{\theta} \quad & 0 \leq y \leq \theta \\ 1 \quad &y > \theta \\ \end{cases} $$

which is the CDF for a $Y \sim \text{Uniform}(0, \theta)$ distribution. Returning to our previous expression, we have that

\begin{align*} P(n(\theta - \hat{\theta}_n) \leq t) &= 1 - \prod^n_{i=1} P \left( \lvert X_i \rvert \leq \theta - \frac{t}{n} \right) \\ &= 1 - \prod^n_{i=1} F_Y \left(\frac{\theta - t/n}{\theta} \right) \\ &= 1 - \left( 1 - \frac{t}{n \theta} \right)^n \\ \end{align*}

Using the power series representation of the exponential function together with the Binomial theorem, the exponential function has the following limit representation:

$$\exp(t) = \lim_{n \rightarrow \infty} \left(1 + \frac{t}{n} \right)^n$$

Consequently, we have that

$$\exp \left( -\frac{1}{\theta}t \right) = \lim_{n \rightarrow \infty} \left(1 - \frac{t}{n \theta} \right)^n$$

Considering the limit as $n \rightarrow \infty$ in the previous expression, we have that

$$\lim_{n \rightarrow \infty} P(n(\theta - \hat{\theta}_n) \leq t) = 1 - \lim_{n \rightarrow \infty} \left(1 - \frac{t}{n \theta} \right)^n = 1 - \exp \left(-\frac{1}{\theta}t \right)$$

Hence we have that

$$n(\theta - \hat{\theta}_n) \overset{d}{\longrightarrow} \text{Exponential}\left( \frac{1}{\theta} \right)$$

i.e. the limiting distribution of $n(\theta - \hat{\theta}_n)$ is an exponential distribution with parameter $1 / \theta$.

Related Question