The task is:
Population of the students has been divided into the following three
groups:
- Students with the mean of grades below 3.5
- Students with the mean of grades between 3.5 and 4.5
- Students with the mean of grades above 4.5
Each student in the population is described by a vector of
random variables $x= (x^1\ x^2\ x^3)^T$, taking one of three
possible states: $(1\ 0\ 0)^T$ if the student belongs to the first
group, $(0\ 1\ 0)^T$ if the student belongs to the second group,
and $(0\ 0\ 1)^T$ if the student belongs to the third group. The
distribution of $x$ is categorical distribution (also known as
generalized Bernoulli distribution or Multinoulli distribution) with
parameters $\theta= (\theta_1\ \theta_2\ \theta_3)^T$. From the
population of the students N examples were drawn. Calculate the
maximum likelihood estimator of $\theta$.
I tried to do it similarly to Bernoulli case, but I'm stuck. The idea was to find $\theta^*$ by finding the maximum of probability distribution function. So my try was
$$
M(x\mid\theta)=\prod_{d=0}^D \theta_d^{x_d}=\theta_1^{x_1} \theta_2^{x_2} \theta_3^{x_3}\\
\theta^* = \operatorname*{argmax}_\theta M(x\mid\theta) = \operatorname*{argmax}_\theta \ln(M(x\mid\theta))\\
\ln(M(x\mid\theta))= \ln(\theta_1^{x_1} \theta_2^{x_2} \theta_3^{x_3}) = x_1\ln\theta_1 + x_2\ln\theta_2 + x_3\ln\theta_3 = x^T (\ln\theta_1\ \ln\theta_2\ \ln\theta_3)^T
$$
Next step would be calculating derivative with respect to $\theta$ and finding it's zero, but we don't have $\theta$ in the function.
I'm not sure where is my mistake. Or perhaps there is no mistake and it is possible to convert $(\ln\theta_1\ \ln\theta_2\ \ln\theta_3)^T$ to some form with $\theta$?
Best Answer
Since $(\theta_1,\theta_2,\theta_3)$ must satisfy the constraint $$\theta_1+\theta_2+\theta_3 = 1,\tag 0$$ one way to do this is by Lagrange multipliers. You have $$ \operatorname{grad} (\theta_1+\theta_2+\theta_3) = (1,1,1) \tag 1 $$ and $$ \operatorname{grad} (x_1\log\theta_1 + x_2\log\theta_2 + x_3\log\theta_3) = \left( \frac{x_1}{\theta_1}, \frac{x_2}{\theta_2}, \frac{x_3}{\theta_3} \right). \tag 2 $$ So you want a value of $(\theta_1,\theta_2,\theta_3)$ for which $(2)$ is a scalar multiple of $(1).$ That happens only if the ratio $\theta_1:\theta_2:\theta_3$ is equal to the ratio $x_1:x_2:x_3.$ But the constraint $(0)$ must also hold. Consequently you get $$ \theta_1 = \frac{x_1}{x_1+x_2+x_3} $$ and similarly for the other two values of the subscript.