Statistics – Maximum Likelihood Estimator of Categorical Distribution

statistics

The task is:

Population of the students has been divided into the following three
groups:

  1. Students with the mean of grades below 3.5
  2. Students with the mean of grades between 3.5 and 4.5
  3. Students with the mean of grades above 4.5

Each student in the population is described by a vector of
random variables $x= (x^1\ x^2\ x^3)^T$, taking one of three
possible states: $(1\ 0\ 0)^T$ if the student belongs to the first
group, $(0\ 1\ 0)^T$ if the student belongs to the second group,
and $(0\ 0\ 1)^T$ if the student belongs to the third group. The
distribution of $x$ is categorical distribution (also known as
generalized Bernoulli distribution or Multinoulli distribution) with
parameters $\theta= (\theta_1\ \theta_2\ \theta_3)^T$. From the
population of the students N examples were drawn. Calculate the
maximum likelihood estimator of $\theta$.

I tried to do it similarly to Bernoulli case, but I'm stuck. The idea was to find $\theta^*$ by finding the maximum of probability distribution function. So my try was

$$
M(x\mid\theta)=\prod_{d=0}^D \theta_d^{x_d}=\theta_1^{x_1} \theta_2^{x_2} \theta_3^{x_3}\\
\theta^* = \operatorname*{argmax}_\theta M(x\mid\theta) = \operatorname*{argmax}_\theta \ln(M(x\mid\theta))\\
\ln(M(x\mid\theta))= \ln(\theta_1^{x_1} \theta_2^{x_2} \theta_3^{x_3}) = x_1\ln\theta_1 + x_2\ln\theta_2 + x_3\ln\theta_3 = x^T (\ln\theta_1\ \ln\theta_2\ \ln\theta_3)^T
$$

Next step would be calculating derivative with respect to $\theta$ and finding it's zero, but we don't have $\theta$ in the function.

I'm not sure where is my mistake. Or perhaps there is no mistake and it is possible to convert $(\ln\theta_1\ \ln\theta_2\ \ln\theta_3)^T$ to some form with $\theta$?

Best Answer

Since $(\theta_1,\theta_2,\theta_3)$ must satisfy the constraint $$\theta_1+\theta_2+\theta_3 = 1,\tag 0$$ one way to do this is by Lagrange multipliers. You have $$ \operatorname{grad} (\theta_1+\theta_2+\theta_3) = (1,1,1) \tag 1 $$ and $$ \operatorname{grad} (x_1\log\theta_1 + x_2\log\theta_2 + x_3\log\theta_3) = \left( \frac{x_1}{\theta_1}, \frac{x_2}{\theta_2}, \frac{x_3}{\theta_3} \right). \tag 2 $$ So you want a value of $(\theta_1,\theta_2,\theta_3)$ for which $(2)$ is a scalar multiple of $(1).$ That happens only if the ratio $\theta_1:\theta_2:\theta_3$ is equal to the ratio $x_1:x_2:x_3.$ But the constraint $(0)$ must also hold. Consequently you get $$ \theta_1 = \frac{x_1}{x_1+x_2+x_3} $$ and similarly for the other two values of the subscript.

Related Question