Solved – exponential density & bernoulli distribution

bernoulli-distributionmaximum likelihoodself-study

I had asked a question on Maximum Likelihood earlier. Now I have two questions that are related to this question which had:

Let $x$ have an exponential density $p(x|\theta) = \theta e^{-\theta x} \text{ if }x \ge 0;\quad 0 \text{ otherwise.}$

Now suppose that n samples $x_1, … , x_n$ are drawn independently according to $p(x|\theta)$ Show that the maximum-likelihood estimate for $\theta$ is given by

$$\hat{\theta} = \frac{1}{\frac{1}{n} \,\, \sum^n_{k = 1} x_k}$$

The second question is similar

Let $x$ be a $d$-dimensional binary (0 or 1) vector with a multivariate Bernoulli distribution

$$p(x|\theta) = \prod^d_{i=1} \theta_i^{x_i}(1-\theta _i)^{1-x_i}$$

where $ \theta = ( \theta_i, …, \theta_d)^t $ is an unknown parameter vector, $\theta_i$ being the probability that $x_i = 1$.

Show that the maximum-likelihood estimate for $\theta$ is

$$ \hat{\theta} = \frac{1}{n} \sum^n_{k = 1} x_k $$

how would I show that the maximum-likelihood estimate is given for either of them. I've been reading tutorials and stuff and I still don't quite understand it.

Best Answer

Considered as a function of its parameters $\theta$ for fixed data $x$, the likelihood evaluates the probability that that you would observe the data at some values for $\theta$ under the assumption that they are drawn from this particular model.

The likelihood function $L(\theta|x)$ represents the joint probability of observing all of these data under the model $f(x|\theta)$. When you find a value of $\theta$ for which this joint probability is the global maximum, we reason that this is the parameter value that most likely gave rise to the data. For some simple models, we can define the MLE estimate in closed form using calculus.

We start by writing the likelihood function $L(\theta|x)=f(\theta|x)$. For the exponential case, $L(\theta|x)=\prod_{i=1}^n \theta\exp(-\theta x_i)$. However, the product operator makes evaluation very messy and difficult. Most people deal with the log-likelihood because it transforms the product into a sum and because the logarithm attains its maximum at the same place: $\ln(L(\theta|x))=\sum_{i=1}^n (\ln(\theta)-\theta x_i).$ Now comes the fun part: taking the derivative of the log-likelihood with respect to $\theta$. This helpful because when the derivative is zero, we know we have found either a maximum or a minimum. $\frac{d \ln(L)}{d \theta}=\sum_{i=1}^n (\frac {1}{\theta}- x_i)$. Manipulating the sigma operator, we find $\frac{d \ln(L)}{d \theta}=\frac{n}{\theta}+ \sum_{i=1}^n x_i$. Setting the derivative equal to zero and solving for $\theta$: $0=\frac {n}{\theta}-\sum_{i=1}^n x_i$, which can be manipulated to show $\frac{1}{\hat \theta}=\frac {1}{n}\sum_{i=1}^nx_i$.

To confirm that this is a maximum, we can check the second derivatives using standard results from calculus, or look at a plot of various values of $(\theta, \ln(L))$ in the neighborhood of $\hat \theta$. For this problem, I believe there is only one solution for $\hat \theta$, so we don't have to worry about proving that this is the global maximum.

I hope this helps. The same procedures should help you with the Bernoulli problem, but I am less familiar with that process so I would not want to speak out of turn. This is only a cursory treatment of the reasoning process of MLE. I would highly recommend Gary King's book Unifying Political Methodology, which contains a very thorough, very accessible explanation of the MLE procedure.

@Jonas, is there anything that could be made more clear in my answer? I notice you marked it as "correct" and then reversed that.