It appears that the only issue with the answer the OP gave in the question is that he has overlooked the fact that we have a sample of size $n$ in our hands. Then what the MLE maximizes is the Likelihood of the sample. Since this is a random (=i.i.d) sample, it follows that the joint density of the sample is the product of $n$ densities, and in turn it appears that the likelihood of the sample is the joint density expression viewed as a function of the unknown parameter, for the given sample:
$$L(\theta|x_1,\ldots,x_n)=\begin{cases} \left(\frac{1}{\sqrt{2\pi }}\right)^n\cdot\exp\big\{-\sum_{i=1}^n(x_i^2/2) \big\} &\text{ if }\theta=1\\
\\
\left(\frac{1}{\pi}\right)^n\prod_{i=1}^n(1+x_i^2)^{-1}&\text{ if }\theta=2\end{cases}$$
where the $x_i$'s are the actual series of numbers available (realizations of the RV's), and they are to be treated as fixed numbers, much like $\pi$ for example.
But wait, does the above look like a function of $\theta$ given the sample? It seems more like "conditional on the value of the argument, the function is..."
Let's see: for us mortals, a function is defined by two things: its domain, and its functional form. If we want $\theta$ to be its argument, then its domain is $\{1,2\}$ and its functional form changes as its domain changes. That's perfectly fine, we have a case of a "piece-wise" function. These functions may have maxima and minina, etc. as any other function.
Since the domain (the parameter space) is constrained a priori to only two values, what you have to do is evaluate the two branches for the given $x_i$'s and the one with the larger value will be the function's maximum. Since each branch is uniquely associated with a single value of $\theta$, this $\theta$ will be the $\text {argmax}$ of the function. And since this function is a likelihood, then you can argue that you just performed maximum likelihood estimation related to the unknown parameter $\theta$, even though $\theta$ itself does not appear inside the functional forms. Note that in this approach, the constants are indispensable, since they too affect the value of the likelihood. Calculations become simpler if we consider the log-likelihood (without omitting the constants) which is a monotonic transformation,
$$\ln L(\theta|x_1,\ldots,x_n)=\begin{cases} -n\ln \left(\sqrt{2\pi }\right)-(1/2)\sum_{i=1}^nx_i^2 &\text{ if }\theta=1\\
\\
-n\ln \pi - \sum_{i=1}^n \ln(1+x_i^2)&\text{ if }\theta=2\end{cases}$$
For the above to be valid inference, it has to be the case that if, say, the $x_i$'s available have in reality been drawn from a Standard Normal distribution, then if we plug the specific series of $x_i$'s into the Cauchy sample likelihood, we will obtain a smaller numerical value than if we plug them into the Standard Normal sample likelihood. Will it? And, moreover will we obtain the correct result always or as a probabilistic event possibly seeing its probability increasing as the sample size increases?
Let's simulate to obtain some evidence. I created i.i.d. samples of sizes $n =50,100,500,1000$ drawn from a standard normal distribution. For each sample size, I generated $10,000$ such samples. For each sample I calculated the two values of the log-likelihood, and then I obtained the empirical percentage of times the value of the Standard Normal log-likelihood was greater than the value of the Cauchy log-likelihood, i.e. the percentages of times the procedure described above gave me the correct answer. This percentage approximates the probability of obtaining a correct answer. Denote this event as
$$B = \{\text{sample is normal and the value of the standard Normal log-likelihood} $$
$$\text{was greater than the value of the standard Cauchy log-likelihood}\}$$
I obtained
\begin{array}{| r | r | r | r|}
\hline
\text{n} & \;\;\text{% of A } \\
\hline
\hline
50 & 100.00 \\
\hline
100 & 100.00 \\
\hline
500 & 100.00 \\
\hline
1000 & 100.00 \\
\hline
\end{array}
One obtains the analogous result if one tries the corresponding procedure using Cauchy samples (to be honest, here I got $2$ false results out of $10,000$ when the sample sizes was $n=50$).
Any ideas as to why we get results with such certainty?
Best Answer
My guess is that the characteristic equation you are departing from is different from mine. Let me proceed in a couple of steps to see whether we agree.
Consider the equation $$ \lambda^2-\phi_1\lambda-\phi_2=0 $$
If $z$ is a root of the "standard" characteristic equation $1-\phi_1 z-\phi_2 z^2=0$ and setting $z^{-1}=\lambda$, the display obtains from rewriting the standard one as follows: \begin{eqnarray*} 1-\phi_1 z-\phi_2 z^2&=&0\\ \Rightarrow z^{-2}-\phi_1 z^{-1}-\phi_2 &=&0\\ \Rightarrow \lambda^2-\phi_1\lambda -\phi_2 &=&0 \end{eqnarray*} Hence, an alternative condition for stability of an $AR(2)$ is that all roots of the first display are inside the unit circle, $|z|>1 \Leftrightarrow |\lambda|=|z^{-1}|<1$.
We use this representation to derive the stationarity triangle of an $AR(2)$ process, that is that an $AR(2)$ is stable if the following three conditions are met:
Recall that you can write the roots of the first display (if real) as $$ \lambda_{1,2}=\frac{\phi_1\pm\sqrt{\phi_1^2+4\phi_2}}{2} $$ to find the first two conditions.
Then, the $AR(2)$ is stationary iff $|\lambda|<1$, hence (if the $\lambda_i$ are real): \begin{eqnarray*} -1<\frac{\phi_1\pm\sqrt{\phi_1^2+4\phi_2}}{2}&<&1\\ \Rightarrow -2<\phi_1\pm\sqrt{\phi_1^2+4\phi_2}&<&2 \end{eqnarray*} The larger of the two $\lambda_i$ is bounded by $\phi_1+\sqrt{\phi_1^2+4\phi_2}<2$, or: \begin{eqnarray*} \phi_1+\sqrt{\phi_1^2+4\phi_2}&<&2\\ \Rightarrow \sqrt{\phi_1^2+4\phi_2}&<&2 - \phi_1\\ \Rightarrow \phi_1^2+4\phi_2&<&(2 - \phi_1)^2\\ \Rightarrow \phi_1^2+4\phi_2&<&4 - 4\phi_1+\phi_1^2\\ \Rightarrow \phi_2&<&1 - \phi_1 \end{eqnarray*} Analogously, we find that $\phi_2<1 + \phi_1$.
If $\lambda_i$ is complex, then $\phi_1^2<-4\phi_2$ and so $$\lambda_{1,2} = \phi_1/2\pm i\sqrt{-(\phi_1^2+4\phi_2)}/2.$$ The squared modulus of a complex number is the square of the real plus the square of the imaginary part. Hence, $$ \lambda^2 = (\phi_1/2)^2 + \left(\sqrt{-(\phi_1^2+4\phi_2)}/2\right)^2 = \phi_1^2/4-(\phi_1^2+4\phi_2)/4 = -\phi_2. $$ This is stable if $|\lambda|<1$, hence if $-\phi_2<1$ or $\phi_2>-1$, as was to be shown. (The restriction $\phi_2<1$ resulting from $\phi_2^2<1$ is redundant in view of $\phi_2<1+\phi_1$ and $\phi_2<1-\phi_1$.)
Plotting the stationarity triangle, also indicating the line that separates complex from real roots, we get
Produced in R using