Solved – How to estimate parameters of a log-normal distribution

lognormal distributionmaximum likelihood

I am using income data from the Current Population Survey for a small undergrad economics paper.

In economics, there is evidence that the income of 97%–99% of the population is distributed log-normally. The distribution of higher-income individuals follows a Pareto distribution.

I have used kernel density estimation to plot the lower 99% and the graph does appear to be log-normal. But I would like to estimate mu and sigma; how do I go about this?

I have been reading about maximum likelihood estimation. But I'm just not sure how to calculate this when I have 200,000 rows of information. Do I have to write my own algorithm to sum over all of my x's? Or is there a built-in function I could use?

I would ideally like to do this in R or Stata.

Best Answer

I am not sure if this question belongs to stats.stackexchange. Anyhow you don't need to write any function! Here is how to generate a random sample from a lognormal distribution and then estimate parameters in R.

> #Generate 200,000 random sample from a lognormal distribution with mean .5 and s.d.=2 
> x=rlnorm(200000, meanlog = .5, sdlog = 2)
> #Load package MASS
> library(MASS)
> #M.L. estimate of the parameters
> fitdistr( x, densfun = "log-normal")
     meanlog        sdlog   
  0.491560746   1.999413446 
 (0.004470824) (0.003161350)
>

Related Solutions

Solved – Get log-normal distribution parameters by min, max, mean

Given a random sample $X_1,X_2,\dots,X_n$ from a density $f(x)$ and cdf $F(x)$, the joint density of the sample minimum and maximum is $$ f_{X_{(1)},X_{(n)}}(x_1,x_n)=\frac{n!}{(n-2)!}f(x_1)f(x_n)[F(x_n)-F(x_1)]^{n-2}. $$ Based on the central limit theorem, unless $n$ is small or $\sigma$ large, the distribution of the sample mean $\bar X$ conditional on $X_{(1)}=x_1$ and $X_{(n)}=x_n$ should be well approximated by a normal distribution with the appropriate mean and variance (derived from the mean and variance of the truncated lognormal distribution of the observations in-between the minimum and maximum $x_1$ and $x_n$). The likelihood based on observations $x_{1},x_{n},\bar x$ is then $$ L(\mu,\sigma) = f_{X_{(1)},X_{(n)}}(x_1,x_n)f_{\bar X|X_{(1)}=x_1,X_{(n)}=x_n}(\bar x) $$ which you can maximise numerically with respect to $\mu$ and $\sigma$.

R implementation:

lnormpar <- function(x1, xn, xbar, n, start=c(0,1)) {
  # negative log likelihood
  nll <- function(theta) {
    mu <- theta[1]
    sigma <- theta[2]
    z1 <- (log(x1)-mu)/sigma
    z2 <- (log(xn)-mu)/sigma
    # mean and variance of (x_1,x_n)-truncated lognormal
    mu1.trunc <- exp(mu + sigma^2/2)*
      (pnorm(z2 - sigma) - pnorm(z1 - sigma))/
      (pnorm(z2) - pnorm(z1))
    mu2.trunc <- exp(2*mu + 2*sigma^2)*
      (pnorm(z2 - 2*sigma) - pnorm(z1 - 2*sigma))/
      (pnorm(z2) - pnorm(z1))
    var.trunc <- mu2.trunc - mu1.trunc^2
    # joint density of x1, xn, xbar
    ll <- 
      sum(dlnorm(c(x1,xn), mu, sigma, log=TRUE)) +
      (n-2)*log(plnorm(xn, mu, sigma) - plnorm(x1, mu,sigma)) +
      dnorm(xbar, (x1 + xn + (n-2)*mu1.trunc)/n, sqrt(var.trunc/(n-2)), log=TRUE)
    -ll
  }
  # maximise the log likelihood
  opt <- optim(start, nll, hessian=TRUE)
  # extract parameter estimates
  res <- cbind(opt$par, sqrt(diag(solve(opt$hessian))))
  rownames(res) <- c("mu","sigma")
  colnames(res) <- c("Estimate","Std. Error")
  res
}

The result assuming a sample size of $n=10$:

> lnormpar(x1=100,xn=10000,xbar=1000,n=10)
      Estimate Std. Error
mu    6.489252  0.5747346
sigma 1.409383  0.3306496

Linear Model – Linear Regression to Predict Both Mean and SD of Dependent Variable: Comprehensive Guide

As you are interested in modeling percentiles, you should have a look at quantile regression methods. Instead of modeling conditional means (as in linear regression), quantile regression allows you to model (conditional) quantiles.

As mentioned in the comments, a good introduction to quantile regression is the vignette to the quantreg R package. One of the examples in the vignette illustrates your use case:

Best Answer

Related Solutions

Solved – Get log-normal distribution parameters by min, max, mean

Linear Model – Linear Regression to Predict Both Mean and SD of Dependent Variable: Comprehensive Guide

Related Question