Why I cannot generate random numbers having a truncated lognormal distribution

lognormal distributionrandom variabletruncated normal distributiontruncated-distributions

My deduction is:

When the distribution is truncated, a normalization factor should be introduced:
\begin{equation}
g(x) = \frac{C}{x\sigma\sqrt{2\pi}}e^{-\frac{1}{2}\left(\frac{\ln{x}-\mu}{\sigma}\right)^2},
\end{equation}

where $C$ is the factor. Integrating $g(x)$ over the interval $[x_1, x_2]$ ($x_1 \geq 0$), we have
\begin{equation}
\frac{C}{\sigma\sqrt{2\pi}}\int_{x_1}^{x_2}\frac{1}{x}e^{-\frac{1}{2}\left(\frac{\ln{x}-\mu}{\sigma}\right)^2}\mathrm{d}x = 1.
\label{CDF_truncated}
\end{equation}

To determine $C$, let $t = \frac{\ln{x}-\mu}{\sigma}$, so $x = e^{\sigma t + \mu}$ and $\mathrm{d}x = \sigma e^{\sigma t + \mu}\mathrm{d}t$. The interval becomes $\left[t_1, t_2\right]$ which is given by
\begin{equation}
t_1 = \frac{\ln{x_1}-\mu}{\sigma} \text{ and } t_2 = \frac{\ln{x_2}-\mu}{\sigma}. \notag
\end{equation}

Hence, the inregration can be rewritten as
\begin{equation}
\frac{C}{\sigma\sqrt{2\pi}}\int_{t_1}^{t_2} \sigma\cdot e^\mu \cdot e^{-\sigma t – \mu}\cdot e^{-\frac{t^2}{2} + \sigma t}\mathrm{d}t = 1, \notag
\end{equation}

\begin{equation}
\Rightarrow \frac{C}{\sqrt{2\pi}}\int_{t_1}^{t_2} e^{-\frac{t^2}{2}}\mathrm{d}t = 1. \notag
\end{equation}

Next, let $m = \frac{t}{\sqrt{2}}$ and $t = \sqrt{2}m$. So, $\mathrm{d}t=\sqrt{2}\mathrm{d}m$. The Eintegration can be further simplified into
\begin{equation}
\frac{C}{\sqrt{2\pi}}\int_{m_1}^{m_2} \sqrt{2}e^{-m^2}\mathrm{d}m = 1, \notag
\end{equation}

\begin{equation}
\Rightarrow \frac{C}{\sqrt{\pi}}\int_{m_1}^{m_2} e^{-m^2}\mathrm{d}m = 1. \notag
\end{equation}
. So, the $C$ is
\begin{equation}
C = \frac{-2}{[{erf}(m_1)-{erf}(m_2)]},
\end{equation}

where ${erf}(\cdot)$ is the error function. If $x_1 = 0 \text{ and } x_2 = +\infty$, then $t_1 = -\infty \text{ and } t_2 = +\infty$, and $m_1 = -\infty \text{ and } m_2 = +\infty$, and ${erf}(m_1) \text{ and } {erf}(m_1)$ are -1 and 1, respectively, $C = 1$ which leads to a regular non-truncated PDF.

Now, let us generate random log-normal variables from the standard uniform distributions. \begin{equation}
p = C\int_{x_1}^{x}\frac{1}{x\sigma\sqrt{2\pi}}e^{-\frac{1}{2}\left(\frac{\ln{x}-\mu}{\sigma}\right)^2}\mathrm{d}x.
\end{equation}

where $p$ is a random number within [0, 1]. In a similar way, \begin{equation}
p = \frac{C}{ \sqrt{\pi}}\int_{m_1}^{m}e^{-m^2}\mathrm{d}m, \notag
\end{equation}

\begin{equation}
\Rightarrow m = {erfinv}\left({{erf}(m_1) – \frac{2 p }{C}}\right), \notag
\end{equation}

and $m = \frac{\frac{\ln{x}-\mu}{\sigma}}{\sqrt{2}}$, $erfinv(\cdot)$ is the inverse of the error function.

I write the following MATLAB script to generate random lognormal variables, with given expection and variance. But I got NaN.

clc;
close all;
clear all;

p = rand;

mean = 6; %given expection
variance = 0.5; % given variance
x1 = 1.5;
x2 = 15;

mu_1 = log(mean * mean / ((variance + mean * mean)^0.5)); %determine mean
sigma_1 = (log(1 + (variance) / (mean * mean)))^0.5; %determine std. deviation

t1 = (log(x1) - mu_1) / sigma_1;
t2 = (log(x2) - mu_1) / sigma_1;

m1 = t1 / (2^0.5);
m2 = t2 / (2^0.5);

C = -2 / (erf(m1) - erf(m2));

m = erfinv(erf(m1) - 2 * p / C);


m =

NaN

what's wrong with my deduction? or my code?

Update——————————————

according to the comments of @jbowman, I write the following MATLAB code:

clc;
close all;
clear all;

mean = 6; %given expection
variance = 9; % given variance
x1 = 1.5;
x2 = 15;

mu_1 = log(mean * mean / ((variance + mean * mean)^0.5)); %determine mean
sigma_1 = (log(1 + (variance) / (mean * mean)))^0.5; %determine std. deviation

% then, convert normal to std. normal
% Any point (x) from a normal distribution can be converted to the standard normal distribution (z) with the formula z = (x-mean) / standard deviation
l = (log(x1) - mu_1) / sigma_1;
u = (log(x2) - mu_1) / sigma_1;

pd = makedist('Normal', 'mu', 0, 'sigma', 1);

p_l = cdf(pd, l);
p_u = cdf(pd, u);

array_rand_num = [];

nf = 1000;

for i = 1:1:nf
    x = rand;

    x_prime = p_l + (p_u - p_l) * x;

    z = icdf(pd, x_prime);

    p = exp(z * sigma_1 + mu_1);

    array_rand_num(i) = p;
end

figure(1)
nbins = 30;
histfit(array_rand_num, nbins, 'lognormal');
xticks(0:1:20);
pd = fitdist(array_rand_num', 'lognormal');
fitting_Ex = exp(pd.mu + pd.sigma^2 * 0.5)
fitting_Dx = exp(2 * pd.mu + pd.sigma^2) * (exp(pd.sigma^2) - 1)
%since the pdf is truncated, the fitting Ex and Dx are not exactly equal to
%the input ones

Best Answer

It is a little easier notationally to work directly with the standard Normal distribution than with the lognormal distribution and error functions, and the conversion from the random number generated from a truncated standard Normal distribution to one generated from the desired truncated Lognormal distribution should be clear.

Let us label the lower and upper bounds (converted to the standard Normal distribution appropriately) as $(l, u)$ respectively. Then define $p_l = \Phi(l)$ and $p_u = \Phi(u)$, the values of the cumulative density function at $l$ and $u$. Generate random numbers $z$ as follows:

  1. $x \leftarrow U(0,1)$
  2. $x' \leftarrow p_l + (p_u - p_l)x$
  3. $z \leftarrow \Phi^{-1}(x')$

The second assignment creates $x' \sim U(p_l, p_u)$, and running this through the inverse CDF function of a standard Normal variate will produce a random number that has a standard Normal distribution truncated at $(l, u)$.

Of course, if you have access to a function that will calculate the inverse CDF of a Lognormal distribution with specified parameters, you can work directly with that in step 3, saving yourself any effort of converting from the Lognormal to the standard Normal and back again.

Related Question