Solved – Generating a log-normally distributed pseudorandomly generated data set

lognormal distributionrandom-generation

The feedback I received from my initial post seemed to indicate that my question was ill-posed. Hence, I would like to clarify what I am doing and how I hope to achieve it.

I'm running some simulations on models with a material parameter $g(x)$ whose values, according to the literature I've read, are log-normally distributed in the spatial. The material parameter $g(x)$ corresponds to a physical property that is strictly positive and experimentally determined to be never larger or smaller than some threshold values $g_{\max},g_{\min}>0$. When I run my simulations, I would like to generate a distribution of values $g$ over the spatial domain such that

  1. All values of $g$ are strictly in the interval $[g_{\min},g_{\max}]$
  2. The distribution of values of $g$ is as close to a log-normal distribution as possible for a given mean $\mu$ and standard deviation $\sigma$.

I have at my disposal a pseudorandom number generator that creates a vector of normally distributed values with a given mean $\mu$ and standard deviation $\sigma$. I know that if the data set $X$ is normally distributed, then an exponential transformation of the data (in other words, $\exp(X)$) should produce a log normal distribution.

I hypothesize that I can create this log normal distribution by the following process:

  1. Make a normal distribution of values X with mean $\mu$ and standard deviation $\sigma$.
  2. Exponentiate the values to generate a new set $Y=\exp(X)$.
  3. Rescale and shift the values of $Y$ so that the minimum value of the new set is exactly $g_{\min}$ and the maximum is $g_{\max}$. I would do so by creating a new data set $Z=a + (b-a)Y$.

Would this give me the results I desire? That is, in the end, would the set $Z$ be log-normally distributed with all values lying in the interval $[g_{\min},g_{\max}]$?

Best Answer

No. The data $Z$ clearly won't be distributed as log-Normal with parameters $\mu$ & $\sigma$; nor will they be distributed (conditional on $a$ & $b$) as any log-Normal, as you've introduced a location shift—which is not equivalent to changing a parameter value. Note also that you'll get a different distribution for each simulation of $Z$ & it will look very different at different sample sizes.

The discrepancy between the literature & experimental results should be the first thing to investigate. It could be that a truncated log-Normal is what you want, as @whuber suggested, & which you can easily get by discarding values of $Y$ outside $[g_\mathrm{min}, g_\mathrm{max}$]. Or perhaps another distribution entirely—@Nick's suggestion of a beta distribution is worth following up.

Related Question