If $X\sim \mathrm{lognormal}$ then $Y:=(X-d|x\geq d)$ has approximately a Generalized Pareto distribution.

conditional probabilityprobabilityprobability distributionsprobability theory

Let $X$ be a random variable with lognormal distribution. Show that when sufficiently large then $Y:=(X-d|x\geq d)$ is approximately a random variable with generalized Pareto distribution.

Hint: Use the fact that $\mathrm{erf}(x)\approx 1-\frac{1}{\sqrt{x}}e^{-\frac{x^2}{2}}$ for large values of $x$.

My attempt: We recall that the density function for the lognormal distribution is given by
$$
f(x)=\frac{1}{x\sigma\sqrt{2\pi}}e^{\frac{-(\log x-\mu)^2}{\sigma}}\:\:\mbox{ for }x>0.
$$

The comulative distribution function for a generalized Pareto random variable is given by
$$
G(x)=1-\left(1+\frac{\gamma x}{\theta}\right)^{\frac{-1}{\gamma}}.
$$

The objective is to find parameters $\gamma$ and $\theta$ such that $\mathbb{P}(Y\leq y)\approx G(y)$, it is clear that $\gamma$ and $\theta$ will be expressed in terms of $\sigma$, $\mu$ and $d$. My attempt is:
\begin{align}
\mathbb{P}(Y\leq y) & =1- \frac{1-\int_{0}^{d+y}\frac{1}{x\sigma\sqrt{2\pi}}e^{\frac{-(\log x-\mu)^2}{\sigma}} dx}{1-\int_{0}^{d}\frac{1}{x\sigma\sqrt{2\pi}}e^{\frac{-(\log x-\mu)^2}{\sigma}} dx}
\end{align}

We consider the changge of variable given by $t=\frac{\log x -\mu}{\sqrt{2}\sigma}$, then $dt=\frac{1}{\sqrt{2}\sigma x}dx$, so, $dx=\sqrt{2}\sigma x dt$. Therefores, we have

\begin{align}
\mathbb{P}(Y\leq y) & =1- \frac{1-\frac{1}{\sqrt{\pi}}\int_{-\infty}^{\frac{\log(d+y) -\mu}{\sqrt{2}\sigma}}e^{-t^{2}} dt }{1-\frac{1}{\sqrt{\pi}}\int_{-\infty}^{\frac{\log(d) -\mu}{\sqrt{2}\sigma}}e^{-t^{2}} dt} \\
&= 1- \frac{1-\frac{1}{\sqrt{\pi}}\int_{-\infty}^{0}e^{-t^{2}} dy- \frac{1}{\sqrt{\pi}}\int_{0}^{\frac{\log(d+y) -\mu}{\sqrt{2}\sigma}}e^{-t^{2}} dt }{1-\frac{1}{\sqrt{\pi}}\int_{-\infty}^{0}e^{-t^{2}} dy-\frac{1}{\sqrt{\pi}}\int_{0}^{\frac{\log(d) -\mu}{\sqrt{2}\sigma}}e^{-t^{2}} dt} \\
&= 1- \frac{1-\frac{1}{2}- \frac{1}{\sqrt{\pi}}\int_{0}^{\frac{\log(d+y) -\mu}{\sqrt{2}\sigma}}e^{-t^{2}} dt }{1-\frac{1}{2}-\frac{1}{\sqrt{\pi}}\int_{0}^{\frac{\log(d) -\mu}{\sqrt{2}\sigma}}e^{-t^{2}} dt} \\
&= 1- \frac{\frac{1}{2}- \frac{1}{2}\mathrm{erf}\left(\frac{\log(d+y) -\mu}{\sqrt{2}\sigma}\right) }{\frac{1}{2}- \frac{1}{2}\mathrm{erf}\left(\frac{\log(d) -\mu}{\sqrt{2}\sigma}\right) } \\
&= 1- \frac{1- \mathrm{erf}\left(\frac{\log(d+y) -\mu}{\sqrt{2}\sigma}\right) }{1- \mathrm{erf}\left(\frac{\log(d) -\mu}{\sqrt{2}\sigma}\right) } \\
&\approx 1- \frac{\frac{1}{\sqrt{\frac{\log(d+y) -\mu}{\sqrt{2}\sigma}}}e^{-\frac{(\log(d+y)-\mu)^2}{4\sigma^2}} }{\frac{1}{\sqrt{\frac{\log(d) -\mu}{\sqrt{2}\sigma}}}e^{-\frac{(\log(d)-\mu)^2}{4\sigma^2}} } \leftarrow \mbox{by hint.}\\
&= 1- \sqrt{\frac{\log(d)-\mu}{\log(d+y)-\mu}}e^{-\frac{(\log(d+y)-\mu)^2}{4\sigma^2}+\frac{(\log(d)-\mu)^2}{4\sigma^2}}\\
&=1- \sqrt{\frac{\log(d)-\mu}{\log(d+y)-\mu}}e^{\frac{1}{4\sigma^2}(\log(d)-\log(d+y))(\log(y+d)+\log(d)-2\mu) }\\
&=1- \sqrt{\frac{\log(d)-\mu}{\log(d+y)-\mu}}e^{\frac{1}{4\sigma^2}\log\left(\frac{d+y}{d}\right)\left(2\mu-\log(dy+d^2)\right) }\\
\end{align}

I do not know how to continue, algebraically I have not been able calibrate the parameters to get what I need.

I ask for your help with this problem, any solution or suggestion will be well received.

Best Answer

Let $Z$ denote a standard normal random variable, and recall the well-known approximation $$ \mathbb P(Z\geq n)\approx \frac{e^{-n^2/2}}{n\sqrt{2\pi}},\qquad (1) $$ where I am using the symbol $a_n\approx b_n$ to mean that $\lim_{n\to\infty}\frac{a_n}{b_n}=1$ throughout this posting.

Warmup: standard log-normal case. Let $X=e^Z$ be a standard log-normal random variable. Then we have (for all $x\geq 0$) that $$ \mathbb P(X-e^n\geq x\mid X\geq e^n)=\frac{\mathbb P\bigl(Z\geq n+\log(1+xe^{-n})\bigr)}{\mathbb P(Z\geq n)}. $$ For $n$ large, using the Taylor expansion of $\log(1+\epsilon)$ yields $\log(1+xe^{-n})\approx xe^{-n}$. Thus $$ \mathbb P(X-e^n\geq x\mid X\geq e^n)\approx\frac{\mathbb P(Z\geq n+xe^{-n})}{\mathbb P(Z\geq n)}\approx \frac{n}{n+xe^{-n}}\exp\left[\frac{n^2-(n+xe^{-n})^2}{2}\right],\qquad (2) $$ where in the last approximation we have applied $(1)$ to the top and bottom of the fraction.

Since $\tfrac12[n^2-(n+xe^{-n})^2]\approx -nxe^{-n}$, we see that the exponential term dominates the first term on the right side of $(2)$ (since it involves a term of magnitude only $n^{-1}xe^{-n}$ after cancellations). With this warm-up in place, we proceed to the general case.

General log-normal case.

Follow the same steps for $X=e^{\mu+\sigma Z}$ a generalized log-normal random variable, we see that the dominant term in the approximation gives $$ \mathbb P(X-e^n\geq x\mid X\geq e^n)\approx \exp\left[\frac{(n-\mu)^2-(n-\mu+xe^{-n})^2}{2\sigma^2}\right]\approx \exp\left[-\frac{(n-\mu)xe^{-n}}{\sigma^2}\right]. $$ Approximating $e^{xe^{-n}}\approx 1+xe^{-n}$ leads to the generalized Pareto form $$ \mathbb P(X-e^n\geq x\mid X\geq e^n)\approx (1+xe^{-n})^{-1/\gamma},\qquad \gamma=\frac{\sigma^2}{n-\mu}, $$ corresponding to $\theta=\gamma e^n$. Writing $d=e^n$ yields the final expression $$ \theta=\frac{d\sigma^2}{\log d-\mu},\qquad \gamma=\frac{\sigma^2}{\log d-\mu}. $$

Related Question