This post elaborates on the answers in the comments to the question.
Let $X = (X_1, X_2, \ldots, X_n)$. Fix any $\mathbf{e}_1\in\mathbb{R}^n$ of unit length. Such a vector may always be completed to an orthonormal basis $(\mathbf{e}_1, \mathbf{e}_2, \ldots, \mathbf{e}_n)$ (by means of the Gram-Schmidt process, for instance). This change of basis (from the usual one) is orthogonal: it does not change lengths. Thus the distribution of
$$\frac{(\mathbf{e}_1\cdot X)^2}{||X||^2}=\frac{(\mathbf{e}_1\cdot X)^2}{X_1^2 + X_2^2 + \cdots + X_n^2} $$
does not depend on $\mathbf{e}_1$. Taking $\mathbf{e}_1 = (1,0,0,\ldots, 0)$ shows this has the same distribution as
$$\frac{X_1^2}{X_1^2 + X_2^2 + \cdots + X_n^2}.\tag{1} $$
Since the $X_i$ are iid Normal, they may be written as $\sigma$ times iid standard Normal variables $Y_1, \ldots, Y_n$ and their squares are $\sigma^2$ times $\Gamma(1/2)$ distributions. Since the sum of $n-1$ independent $\Gamma(1/2)$ distributions is $\Gamma((n-1)/2)$, we have determined that the distribution of $(1)$ is that of
$$\frac{\sigma^2 U}{\sigma^2 U + \sigma^2 V} = \frac{U}{U+V}$$
where $U = X_1^2/\sigma^2 \sim \Gamma(1/2)$ and $V = (X_2^2 + \cdots + X_n^2)/\sigma^2 \sim \Gamma((n-1)/2)$ are independent. It is well known that this ratio has a Beta$(1/2, (n-1)/2)$ distribution. (Also see the closely related thread at Distribution of $XY$ if $X \sim$ Beta$(1,K-1)$ and $Y \sim$ chi-squared with $2K$ degrees.)
Since $$X_1 + \cdots + X_n = (1,1,\ldots,1)\cdot (X_1, X_2, \cdots, X_n) = \sqrt{n}\,\mathbf{e}_1\cdot X$$
for the unit vector $\mathbf{e}_1=(1,1,\ldots,1)/\sqrt{n}$, we conclude that $Z$ is $(\sqrt{n})^2 = n$ times a Beta$(1/2, (n-1)/2)$ variate. For $n\ge 2$ it therefore has density function
$$f_Z(z) = \frac{n^{1-n/2}}{B\left(\frac{1}{2}, \frac{n-1}{2}\right)} \sqrt{\frac{(n-z)^{n-3}}{z}}$$
on the interval $(0,n)$ (and otherwise is zero).
As a check, I simulated $100,000$ independent realizations of $Z$ for $\sigma=1$ and $n=2,3,10$, plotted their histograms, and superimposed the graph of the corresponding Beta density (in red). The agreements are excellent.
Here is the R
code. It carries out the simulation by means of the formula sum(x)^2 / sum(x^2)
for $Z$, where x
is a vector of length n
generated by rnorm
. The rest is just looping (for
, apply
) and plotting (hist
, curve
).
for (n in c(2, 3, 10)) {
z <- apply(matrix(rnorm(n*1e5), nrow=n), 2, function(x) sum(x)^2 / sum(x^2))
hist(z, freq=FALSE, breaks=seq(0, n, length.out=50), main=paste("n =", n), xlab="Z")
curve(dbeta(x/n, 1/2, (n-1)/2)/n, add=TRUE, col="Red", lwd=2)
}
In this problem your unknown parameter $\theta$ only has two possible values, so you have a discrete optimisation where you just have to compare the likelihood at those two parameter values. (If you are taking derivatives of something in a discrete optimisation then you are going down the wrong track.) For an observed data vector $\mathbf{x}$ you have:
$$L_\mathbf{x}(\theta) = \begin{cases}
(2 \pi)^{-n/2} \exp(-\tfrac{1}{2} \sum x_i^2) & & \text{for } \theta = 1, \\[6pt]
\pi^{-n} / \prod (1+x_i^2) & & \text{for } \theta = 2. \\
\end{cases}$$
Since there are only two possible parameter values, you can find the maximising parameter value by looking at the sign of the difference in likelihood at these values. You have:
$$\begin{equation} \begin{aligned}
\Delta(\mathbf{x}) \equiv \text{sgn}(L_\mathbf{x}(1)-L_\mathbf{x}(2))
&= \text{sgn}\Bigg( (2 \pi)^{-n/2} \exp(-\tfrac{1}{2} \sum x_i^2) - \frac{1}{\pi^{n} \prod (1+x_i^2)} \Bigg) \\[6pt]
&= \text{sgn}\Bigg( \exp(-\tfrac{1}{2} \sum x_i^2)\prod (1+x_i^2) - \Bigg( \frac{2}{\pi} \Bigg)^{n/2} \Bigg). \\[6pt]
\end{aligned} \end{equation}$$
The maximum-likelihood-estimator (MLE) is:
$$\hat{\theta} = \begin{cases}
2 & & \text{if } \Delta(\mathbf{x}) = -1, \\[6pt]
\{ 1,2 \} & & \text{if } \Delta(\mathbf{x}) = 0, \\[6pt]
1 & & \text{if } \Delta(\mathbf{x}) = 1. \\[6pt]
\end{cases}$$
(In the case where $\Delta(\mathbf{x}) = 0$ the MLE is non-unique since the likelihood is the same at both parameter values.)
Best Answer
Here is a partial answer. Suppose $D$ is a diagonal matrix whose diagonal entries $d_1,\ldots,d_n$ are positive. Let $$ \mathbf y' = (y_1,\ldots, y_n). $$ Then \begin{align} & \exp\left( \frac{-1}2 \mathbf y'D^{-1} \mathbf y \right) \\[8pt] = {} & \exp\left( \frac{-1} 2 \sum_{i=1}^n \frac{y_i^2}{d_i} \right) \\[8pt] = {} & \prod_{i=1}^n \exp\left( \frac{-1} 2 \frac{y_i^2}{d_i} \right). \end{align} This stops short of treating the case in which $$ D = \operatorname{var}(Y) = \operatorname E(YY') \in \mathbb R^{n\times n} $$ is singular.