Solved – Find the mle of $\theta$

cauchy distributionmathematical-statisticsmaximum likelihoodnormal distributionself-study

This is from Robert Hogg's Introduction to Mathematical Statistics 6th Edition Exercise 6.1.13. The question is:

Let $X_{1},X_{2},…,X_{n} $ be a random sample from a distribution on $\mathbb{R}$
with one of two pdfs:
$$f(x;\theta)=\begin{cases}\dfrac{\exp\{-x^2/2\}}{\sqrt{2\pi }}&\text{ if }\theta=1\\\dfrac{1}{\pi(1+x^2)}&\text{ if }\theta=2\end{cases}
$$
Find the mle of $\theta$.

My simple solution is $\hat{\theta}=ArgmaxL(\theta;X)=ArgmaxL(1,2;X)$.

The max value for $\theta=1$ is $\frac{1}{\sqrt{2\pi}}$, when x=0 for the standard normal distribution

The max value for $\theta=2$ is $\frac{1}{\pi}$, when x=0 for the cauchy distribution.

Therefore, mle of $\theta$ is $\frac{1}{\pi}$

I am not sure whether my solution is correct or not. Please check my solution.
Thank you very much

Sorry, the mle should be $\frac{1}{\sqrt{2\pi}}$ since it is bigger than $\frac{1}{\pi}$ by my simple solution.


After poder this problem for two days, I think the mle is "0" since by difination of MLE

$\hat{\theta}=ArgmaxL(\theta;X)$ which means that $L(\theta;X)$ achieve its maximum value at $\hat\theta$. Since the likelyhood fucntion can achive its maximum value at 0, therefore, by the defination, the mle is 0.

Further, the mle can be a statistic $\overline{X}$ which is just a function of data, then why we can not also treat constant 0 as a fucntion of data, which is 0*data, i.e 0*X.

Best Answer

It appears that the only issue with the answer the OP gave in the question is that he has overlooked the fact that we have a sample of size $n$ in our hands. Then what the MLE maximizes is the Likelihood of the sample. Since this is a random (=i.i.d) sample, it follows that the joint density of the sample is the product of $n$ densities, and in turn it appears that the likelihood of the sample is the joint density expression viewed as a function of the unknown parameter, for the given sample:

$$L(\theta|x_1,\ldots,x_n)=\begin{cases} \left(\frac{1}{\sqrt{2\pi }}\right)^n\cdot\exp\big\{-\sum_{i=1}^n(x_i^2/2) \big\} &\text{ if }\theta=1\\ \\ \left(\frac{1}{\pi}\right)^n\prod_{i=1}^n(1+x_i^2)^{-1}&\text{ if }\theta=2\end{cases}$$

where the $x_i$'s are the actual series of numbers available (realizations of the RV's), and they are to be treated as fixed numbers, much like $\pi$ for example.

But wait, does the above look like a function of $\theta$ given the sample? It seems more like "conditional on the value of the argument, the function is..."

Let's see: for us mortals, a function is defined by two things: its domain, and its functional form. If we want $\theta$ to be its argument, then its domain is $\{1,2\}$ and its functional form changes as its domain changes. That's perfectly fine, we have a case of a "piece-wise" function. These functions may have maxima and minina, etc. as any other function.

Since the domain (the parameter space) is constrained a priori to only two values, what you have to do is evaluate the two branches for the given $x_i$'s and the one with the larger value will be the function's maximum. Since each branch is uniquely associated with a single value of $\theta$, this $\theta$ will be the $\text {argmax}$ of the function. And since this function is a likelihood, then you can argue that you just performed maximum likelihood estimation related to the unknown parameter $\theta$, even though $\theta$ itself does not appear inside the functional forms. Note that in this approach, the constants are indispensable, since they too affect the value of the likelihood. Calculations become simpler if we consider the log-likelihood (without omitting the constants) which is a monotonic transformation,

$$\ln L(\theta|x_1,\ldots,x_n)=\begin{cases} -n\ln \left(\sqrt{2\pi }\right)-(1/2)\sum_{i=1}^nx_i^2 &\text{ if }\theta=1\\ \\ -n\ln \pi - \sum_{i=1}^n \ln(1+x_i^2)&\text{ if }\theta=2\end{cases}$$

For the above to be valid inference, it has to be the case that if, say, the $x_i$'s available have in reality been drawn from a Standard Normal distribution, then if we plug the specific series of $x_i$'s into the Cauchy sample likelihood, we will obtain a smaller numerical value than if we plug them into the Standard Normal sample likelihood. Will it? And, moreover will we obtain the correct result always or as a probabilistic event possibly seeing its probability increasing as the sample size increases?

Let's simulate to obtain some evidence. I created i.i.d. samples of sizes $n =50,100,500,1000$ drawn from a standard normal distribution. For each sample size, I generated $10,000$ such samples. For each sample I calculated the two values of the log-likelihood, and then I obtained the empirical percentage of times the value of the Standard Normal log-likelihood was greater than the value of the Cauchy log-likelihood, i.e. the percentages of times the procedure described above gave me the correct answer. This percentage approximates the probability of obtaining a correct answer. Denote this event as

$$B = \{\text{sample is normal and the value of the standard Normal log-likelihood} $$

$$\text{was greater than the value of the standard Cauchy log-likelihood}\}$$

I obtained

\begin{array}{| r | r | r | r|} \hline \text{n} & \;\;\text{% of A } \\ \hline \hline 50 & 100.00 \\ \hline 100 & 100.00 \\ \hline 500 & 100.00 \\ \hline 1000 & 100.00 \\ \hline \end{array}

One obtains the analogous result if one tries the corresponding procedure using Cauchy samples (to be honest, here I got $2$ false results out of $10,000$ when the sample sizes was $n=50$).

Any ideas as to why we get results with such certainty?

Related Question