Solved – Summarizing a lognormal distribution with geometric mean and standard deviation

descriptive statisticslognormal distribution

I have some data that I strongly suspect are lognormally distributed, and I'd like to summarize the distribution using the mean and standard deviation. I've read that with lognormal distributions the goemetric mean and standard deviation should be used, but using them produces slightly strange results.

Specifically, when using the sample mean and standard deviation I calculate that 86% of my data lie within +/- 1 standard deviation about the mean.

However, when I use the geometric mean and standard deviation I calculate that only ~5% of my data lie within +/- 1 standard deviation about the mean.

I have over 30,000 pieces of data, and the distribution looks strongly lognormal, but I don't really understand these results. Is it valid to use the geometric mean and standard deviation here?

Best Answer

Chebyshev and similar +/- one sigma intervals refer to arithmetic mean and std, not to geometric ones. If you think your $X$ is lognormal, then construct an "arithmetic" -/+ one $\sigma$ interval for $log(X)$, then exponentiate it to get the CI for $X$.

In other words, you have zero chance of constructing an interval with the expected coverage the way you did because for lognormal such interval should not be symmetric around the mean. After you exponentiate, you'll see that the CI became asymmetric wrt the arithmetic mean of $X$.