Solved – On the MLE of p in Bernoulli and Binomial distributions

bernoulli-distributionbinomial distributiondistributionsestimatorsmaximum likelihood

Suppose we have a random variable $X = [x_1, x_2, …, x_m]$, that is distributed $Binomial(n,p)$, with known $n$ and unknown $p$.

Now, assume we want to estimate $p$. Usually, textbooks and articles online give that the MLE of $p$ is $\frac{\sum_{i=1}^{m}x_i}{n}$. However, isn't it correct only when $m=1$, or in other words, when we end up having merely a $Bernoulli$ distribution?

If so, wouldn't it be more precise to say that the MLE of $p$ is actually $\frac{\sum_{i=1}^{m}x_i}{mn}$?

Best Answer

You are right, while even the credible sources sometimes claim differently, the correct formula is

$$ \hat p = \frac{1}{mn} \sum_{i=1}^m x_i = \overbrace{\frac{1}{m} \sum_{i=1}^m}^\text{we have m trials} \overbrace{\frac{x_i}{n}}^{\substack{\text{proportion of successes} \\ \text{in single Bernoulli trial}}} $$

since if you calculated ordinary arithmetic mean of $X$, you'd get average number of successes in $n$ trials, i.e. $E(X) = n \hat p$.

Related Solutions

Solved – MLE for 2 parameter exponential distribution

Given the sample, the likelihood function is given by $$L(\mu,\sigma)=\frac{1}{\sigma^n}\exp\left[-\frac{1}{\sigma}\sum_{i=1}^n(x_i-\mu)\right]\mathbf1_{\mu\leqslant x_{(1)},\sigma>0}$$

This function is not differentiable at $\mu=x_{(1)}$, so that MLE of $\mu$ has to be found using a different argument. For fixed $\sigma$, $L(\mu,\sigma)$ is an increasing function of $\mu$ $\,\forall\,\sigma$, implying that $\hat\mu_{\text{MLE}}=X_{(1)}$.

MLE of $\sigma$ can be guessed from the first partial derivative as usual.

We have $\displaystyle\frac{\partial L(\mu,\sigma)}{\partial\sigma}=0\implies\sigma=\frac{1}{n}\sum_{i=1}^n(x_i-\mu)$.

So MLE of $\sigma$ could possibly be $\displaystyle\hat\sigma_{\text{MLE}}=\frac{1}{n}\sum_{i=1}^n(X_i-\hat\mu)=\frac{1}{n}\sum_{i=1}^n\left(X_i-X_{(1)}\right)$

The second partial derivative test fails here due to $L(\mu,\sigma)$ not being totally differentiable.

So to confirm that $(\hat\mu,\hat\sigma)$ is the MLE of $(\mu,\sigma)$, one has to verify that $L(\hat\mu,\hat\sigma)\geqslant L(\mu,\sigma)$, or somehow conclude that $\ln L(\hat\mu,\hat\sigma)\geqslant \ln L(\mu,\sigma)$ holds $\forall\,(\mu,\sigma)$.

Solved – Finding MLE of the common $\mu$ from normal samples with two unknown variances

Your attempt at the problem is correct so far, but you have only derived the conditional MLE when the variance parameters are known. To derive the unconditional MLE for the mean parameter, you will need to derive the corresponding equations for the MLEs of the variance parameters and then solve the resulting set of simultaneous equations. This should give you a unique MLE estimating each of the three parameters in the model. Let me show you how to do this.

Derivation of the full MLE: For greater clarity, I will denote the variance parameters as $\sigma_x^2$ and $\sigma_y^2$ rather than denoting them with number subscripts. From your specified model, the log-likelihood for your observed data (ignoring an additive constant) can be written as:

$$\begin{equation} \begin{aligned} \ell(\mu,\sigma_x,\sigma_y) &= - m \ln \sigma_x - n \ln \sigma_y -\frac{1}{2} \Bigg[ \sum_{i=1}^m \frac{(x_i - \mu)^2}{\sigma_x^2} + \sum_{i=1}^n \frac{(y_i - \mu)^2}{\sigma_y^2} \Bigg] \\[6pt] &= - m \ln \sigma_x - n \ln \sigma_y -\frac{1}{2} \Bigg[ \frac{1}{\sigma_x^2} \sum_{i=1}^m (x_i^2 - 2 \mu x_i + \mu^2) + \frac{1}{\sigma_y^2} \sum_{i=1}^n (y_i^2 - 2 \mu y_i + \mu^2) \Bigg] \\[6pt] &= - m \ln \sigma_x - n \ln \sigma_y -\frac{1}{2} \Bigg( \frac{1}{\sigma_x^2} \sum_{i=1}^m x_i^2 + \frac{1}{\sigma_y^2} \sum_{i=1}^n y_i^2 \Bigg) \\[6pt] &\quad \quad \quad \quad \quad \quad \quad \text{ } \text{ } + \Bigg( \frac{m\bar{x}}{\sigma_x^2} + \frac{n\bar{y}}{\sigma_y^2} \Bigg) \mu -\frac{1}{2} \Bigg( \frac{m}{\sigma_x^2} + \frac{n}{\sigma_y^2} \Bigg) \mu^2. \\[6pt] \end{aligned} \end{equation}$$

where $\bar{x} = \sum_{i=1}^m x_i / m$ and $\bar{y} = \sum_{i=1}^n y_i / n$ are the sample means of the parts. Hence, your score function consists of the following partial derivatives:

$$\begin{equation} \begin{aligned} \frac{\partial \ell}{\partial \mu}(\mu,\sigma_x,\sigma_y) &= \Bigg( \frac{m\bar{x}}{\sigma_x^2} + \frac{n\bar{y}}{\sigma_y^2} \Bigg) - \Bigg( \frac{m}{\sigma_x^2} + \frac{n}{\sigma_y^2} \Bigg) \mu, \\[10pt] \frac{\partial \ell}{\partial \sigma_x}(\mu,\sigma_x,\sigma_y) &= - \frac{1}{\sigma_x^3} \Bigg( m \sigma_x^2 - \sum_{i=1}^m (x_i - \mu)^2 \Bigg) , \\[10pt] \frac{\partial \ell}{\partial \sigma_y}(\mu,\sigma_x,\sigma_y) &= - \frac{1}{\sigma_y^3} \Bigg( n \sigma_y^2 - \sum_{i=1}^n (y_i - \mu)^2 \Bigg) . \\[10pt] \end{aligned} \end{equation}$$

Setting the partial derivatives to zero yields the following simultaneous equations for the MLE:

$$\hat{\mu} = \frac{m \bar{x} \hat{\sigma}_y^2 + n \bar{y} \hat{\sigma}_x^2}{m \hat{\sigma}_y^2 + n \hat{\sigma}_x^2} \quad \quad \quad \hat{\sigma}_x^2 = \frac{1}{m} \sum_{i=1}^m (x_i - \hat{\mu})^2 \quad \quad \quad \hat{\sigma}_y^2 = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{\mu})^2.$$

These equations give us the conditional MLEs for each of the parameters, when the other parameters are known. To find the unconditional MLEs for each of our parameters we need to solve these simultaneous equations. This is a large algebraic exercise, which I will leave to you.

Solving via profile log-likelihood: Rather than solving these simultaneous equations directly, we can go back and substitute the form of the MLEs for the variance parameters back into the original log-likelihood function to obtain the profile log-likelihood:

$$\begin{equation} \begin{aligned} \ell_*(\mu) \equiv \ell(\mu,\hat{\sigma}_x,\hat{\sigma}_y) &= - m \ln \hat{\sigma}_x - n \ln \hat{\sigma}_y -\frac{1}{2} \Bigg[ \sum_{i=1}^m \frac{(x_i - \mu)^2}{\hat{\sigma}_x^2} + \sum_{i=1}^n \frac{(y_i - \mu)^2}{\hat{\sigma}_y^2} \Bigg] \\[6pt] &= - \frac{m}{2} \cdot \ln \Big( \sum_{i=1}^m (x_i - \mu)^2 \Big) - \frac{n}{2} \cdot \ln \Big( \sum_{i=1}^n (y_i - \mu)^2 \Big) + \text{const}. \\[6pt] \end{aligned} \end{equation}$$

The corresponding score function is:

$$\frac{d\ell_*}{d\mu}(\mu) = \frac{m^2(\bar{x} - \mu)}{\sum_{i=1}^m (x_i - \mu)^2} + \frac{n^2 (\bar{y} - \mu)}{\sum_{i=1}^n (y_i - \mu)^2}.$$

Setting this function to zero yields the following cubic equation for the critical points:

$$0 = m^2 (\bar{x}-\hat{\mu}) \sum_{i=1}^n (y_i - \hat{\mu})^2 + n^2 (\bar{y}-\hat{\mu}) \sum_{i=1}^m (x_i - \hat{\mu})^2.$$

It should be possible to find a unique maximising critical point that gives the MLE. (Substitute into the above conditional MLE equations as a check on your working.) Again, this is a large algebraic exercise that I will leave to you.

Best Answer

Related Solutions

Solved – MLE for 2 parameter exponential distribution

Solved – Finding MLE of the common $\mu$ from normal samples with two unknown variances

Related Question