Solved – Sampling under assumption of log normal distributed data with sample mean and standard deviation

distributionslognormal distributionmeansamplingstandard deviation

I have the sample mean and the sample standard deviation of income calculated from individual tax data of all citizens in country (let's call this data X). I do not have access to this tax income data. I would however like to take random draws from a log normal distribution with the parameters mu and sigma estimated from this tax income data.

However, I am a bit confused about what I need to do this. If I assume the tax income data is log normally distributed, then the sample mean and standard deviation calculated from the data are NOT the $\mu$ and $\sigma$ I am looking for. Is this correct?

If so, how can I calculate mu and sigma? I thought about using the function for the arithmetic mean and variance and plugging-in the sample moments for E(X) and V(X).

$E(X) = e^{\mu+\sigma^2/2}$

$V(X) = (e^{\sigma^2}-1)\cdot e^{2\mu+\sigma^2}$

However, I'm not really able to solve it and not sure if I'm doing the right thing.

Any help is appreciated.

Best Answer

The $\mu$ and $\sigma^2$ parameters are the population mean and variance of the logs of the lognormal random variable with those parameters.

Your equations for them are correct - they're how the population mean and variance of the lognormal relate to the mean and variance of the log-variable.

Equating those expressions to the sample mean and variance would be a reasonable thing to do --- indeed, it's essentially method-of-moments$^\dagger$.

Those equations are rather straightforward to solve.

Divide the variance by the square of the mean, you get an equation in only $\sigma^2$ (one that's easily solved).

Then once you have solved that to get an estimate of $\sigma^2$, it's simple to substitute it back into the first equation to solve for your estimate of $\mu$.

If you want explicit formulas, see here or here$^\dagger$

$\dagger$ keeping in mind that the variance is a central moment but is readily obtained from the second raw moment and the mean, so equating sample and population raw moments should be equivalent to equating sample and population central moments (above the first) as long as you use the $n-$divisor versions of the central sample moments.