I understand the Max likelihood estimators for mu and sigma for the lognormal distribution when data are actual values. However I need to understand how these formulas are modified when data are already grouped or binned (and actual values are not available). Specifically, for mu, the mle estimator is the sum of the logs of each X (divided by n which is the number of points). For sigma squared, the mle estimator is the sum of (each log X minus the mu, squared); all divided by n. (Order of operations is taking each log X minus the mu; square that; sum that over all X's; then divide by n). Now suppose data in bins b1, b2, b3, and so on where b1 to b2 is the first bin; b2 to b3 second bin and so on. What are the modified mu and sigma squared? thank you.
Solved – Lognormal distribution using binned or grouped data
lognormal distributionmaximum likelihood
Related Solutions
Apparently the concept of unbiasedness has already been discussed a long time ago. I feel it is a topic worth of dicussion as mean-unbiasedness is a standard requirement for a good estimator but for small sample it does not mean as much as in large sample estimations.
I post these two references as an answer to my second question in the post.
Brown, George W. "On Small-Sample Estimation." The Annals of Mathematical Statistics, vol. 18, no. 4 (Dec., 1947), pp. 582–585. JSTOR 2236236.
Lehmann, E. L. "A General Concept of Unbiasedness" The Annals of Mathematical Statistics, vol. 22, no. 4 (Dec., 1951), pp. 587–592. JSTOR 2236928
Under the standard regularity conditions, the Fisher information can be expressed as the negative of the expected value of the second derivative of the log-likelihood, and NOT of some transformation of the log likelihood.
Your log-likelihood is $L=n\ln\beta + (-\beta-1)\sum \ln(x_i)$, and that doesn't change. For whatever reasons you used the monotonic transfomartion $\tilde L=\frac 1n L$, in calculating the MLE. Since it is a monotonic transformation, naturally the MLE is the same. Now calculate the Fisher information. It is, irrespective of whether you used $L$ or $\tilde L$ in the maximization procedure,
$$I(\theta) = -E\left[\frac {\partial^2}{\partial \beta^2} L\right]=E\left[\frac n{\beta^2}\right]$$
You cannot substitute $\tilde L$ for $L$ in the calculation of the Fisher information as if $\tilde L$ was equal to $L$ - it is not (the fact that $\tilde L$, being a monotonic transformation of $L$ leads to the same MLE does not make it equal to $L$).
Another way to look at it is to remember that the likelihood w.r.t $\beta$ function is (and should be) also a joint density w.r.t to the $x_i$'s function. In our case (assuming an i.i.d sample) it is
$$f(X;\beta) = \prod_{i=1}^n\beta x_i^{-\beta-1}, \qquad f(x_i;\beta) = \frac {\beta} {x_i^{\beta+1}}$$ which is a Pareto distribution with minimum value $1$.
Now, could the transformed log-likelihood lead to a joint density? It would give
$$\left(\prod_{i=1}^n\beta x_i^{-\beta-1}\right)^{1/n}$$
This is the geometric mean of the product of the $n$ marginal densities. Can it represent the joint density of a collection of i.i.d random variables?
Best Answer
Let $\Phi$ be the cumulative standard normal distribution function. The probability that a value $Y$ drawn from a lognormal distribution with log mean $\mu$ and log SD $\sigma$ lies in the interval $(b_i, b_{i+1}]$ therefore is
$$\Pr(b_i \lt Y \le b_{i+1}) = \Phi \left( \frac{\log(b_{i+1}) - \mu}{\sigma} \right) - \Phi \left( \frac{\log(b_{i}) - \mu}{\sigma} \right).$$
Call this value $f_i(\mu, \sigma)$.
When the data consist of independent draws $Y_1,Y_2, \ldots, Y_N$, with $Y_i$ falling in bin $j(i)$ and the bin cutpoints are established independently of the $Y_i$, the probabilities multiply, whence the log likelihood is the sum of the logs of these values:
$$\log(\Lambda(\mu, \sigma)) = \sum_{i=1}^{N} \log(f_{j(i)}(\mu, \sigma)).$$
It suffices to count the number of $Y_i$ falling within each bin $j$; let this count be $k(j)$. By collecting the $k(j)$ terms associated with bin $j$ for each bin, the sum condenses to
$$\log(\Lambda(\mu, \sigma)) = \sum_{j} k(j) \log(f_{j}(\mu, \sigma)).$$
The MLEs are the values $\hat{\mu}$ and $\hat{\sigma}$ that together maximize $\log(\Lambda(\mu, \sigma))$. There is no closed formula for them in general: numerical solutions are needed.
Example
Consider data values known only to lie within the even intervals $[0,2]$, $[2,4]$, etc. I randomly generated 100 of them according to a Lognormal(0,1) distribution. In Mathematica this can be done via
Here are their tallies:
Finding the MLE for data like this requires two procedures. First, one to compute the contribution of a list of all 100 intervals to the log likelihood:
Second, one to numerically maximize the log likelihood:
The solution reported by Mathematica is
The first value in the list is the log likelihood and the second (evidently) gives the MLEs of $\mu$ and $\sigma$, respectively. They are comfortably close to their true values.
Other software systems will vary in their syntax, but typically they will work in the same way: one procedure to compute the probabilities and another to maximize the log likelihood determined by those probabilities.