Solved – MLE estimator – dividing the log likelihood by n gives different result

maximum likelihood

The log likelihood is as follows:

$nln\beta + (-\beta-1)\sum ln(x_i)$

Dividing the log likelihood by n gives,

$ln\beta + \frac{(-\beta-1)}{n}\sum ln(x_i)$

Using these two log likelihoods, I got the same MLE estimator, which is $\frac{n}{\sum ln (x_i)}$. However, I was confused by the different asymptotic properties. For the first log-likelihood, the information matrix is $\beta^2/n$, for the second, it is simply $\beta^2$.

I understand that simply because of algebra, the results will be different. But I want to understand the statistical property: why do these two MLEs for two log-likelihood have different asymptotic variance? How is the difference related to the fact that the log-likelihood was divided by $n$?

Best Answer

Under the standard regularity conditions, the Fisher information can be expressed as the negative of the expected value of the second derivative of the log-likelihood, and NOT of some transformation of the log likelihood.

Your log-likelihood is $L=n\ln\beta + (-\beta-1)\sum \ln(x_i)$, and that doesn't change. For whatever reasons you used the monotonic transfomartion $\tilde L=\frac 1n L$, in calculating the MLE. Since it is a monotonic transformation, naturally the MLE is the same. Now calculate the Fisher information. It is, irrespective of whether you used $L$ or $\tilde L$ in the maximization procedure,

$$I(\theta) = -E\left[\frac {\partial^2}{\partial \beta^2} L\right]=E\left[\frac n{\beta^2}\right]$$

You cannot substitute $\tilde L$ for $L$ in the calculation of the Fisher information as if $\tilde L$ was equal to $L$ - it is not (the fact that $\tilde L$, being a monotonic transformation of $L$ leads to the same MLE does not make it equal to $L$).

Another way to look at it is to remember that the likelihood w.r.t $\beta$ function is (and should be) also a joint density w.r.t to the $x_i$'s function. In our case (assuming an i.i.d sample) it is

$$f(X;\beta) = \prod_{i=1}^n\beta x_i^{-\beta-1}, \qquad f(x_i;\beta) = \frac {\beta} {x_i^{\beta+1}}$$ which is a Pareto distribution with minimum value $1$.

Now, could the transformed log-likelihood lead to a joint density? It would give

$$\left(\prod_{i=1}^n\beta x_i^{-\beta-1}\right)^{1/n}$$

This is the geometric mean of the product of the $n$ marginal densities. Can it represent the joint density of a collection of i.i.d random variables?

Related Question