Solved – Confidence Interval of a Lognormal Random Variable

confidence intervallognormal distribution

Just learning some stats, so please forgive if this is simple but I couldn't find a good explanation.

Let $X \sim \mathcal{N}(\mu,\sigma^2)$ and $Y = e^X$. To find an approximately 95% confidence interval, note
\begin{align*}
P(a \leq Y \leq b) & = P(a \leq e^X \leq b) \\
& = P(\log a \leq X \leq \log b) \\
& = P\left(\frac{\log a – \mu}{\sigma} \leq Z \leq \frac{\log b – \mu}{\sigma}\right) \\
& = \frac{1}{\sqrt{2\pi}} \int_{\frac{\log a – \mu}{\sigma}}^\frac{\log b – \mu}{\sigma} e^{-z^2/2} dz \\
& \triangleq 0.95,
\end{align*}
for which we know
\begin{align*}
\frac{\log b – \mu}{\sigma} & \approx 2 \iff b = e^{\mu + 2\sigma}, \\
\frac{\log a – \mu}{\sigma} & \approx 2 \iff a = e^{\mu – 2\sigma}.
\end{align*}
Then, my understanding of a confidence interval (CI) would lead me to believe 95% of the values of $Y$ should lie within the interval
$$
[e^{\mu + \sigma^2/2} – e^{\mu – 2\sigma},e^{\mu + \sigma^2/2} + e^{\mu + 2\sigma}],
$$
where $e^{\mu + \sigma^2/2}$ is the mean of $Y$. Is this correct? Specifically, when we speak of a " 95% confidence interval," do we mean that 95% of the values lie within the mean of the random variable, or another average like median or mode?

Finally, to clear up a source of confusion on notation. For a normally-distributed random variable $X \sim \mathcal{N}(\mu,\sigma^2)$, the variance $\sigma^2$ is also the square of the standard deviation (SD) $\sigma$, for which an approximate 95% confidence interval is $[\mu – 2\sigma, \mu + 2\sigma]$. Similarly for a lognormally-distributed random variable $Y = e^X$, its variance is given by $(e^{\sigma^2} – 1) e^{2\mu + \sigma^2}$, and I believe its standard deviation would again just be the square root of this (by definition), namely $\left(\sqrt{e^{\sigma^2} – 1}\right) e^{\mu + \sigma^2/2}$. But now we don't have that an approximate 95% confidence interval is $[mean – 2*SD, mean + 2*SD]$ since the pdf of $Y$ is not symmetric.

So, is the $mean \pm SD$ property for a confidence interval only valid for normal random variables?

Best Answer

Is this correct?

No.

i) This isn't a confidence interval you're calculating (since those are for parameters or functions of them), nor is it really a prediction interval, a tolerance interval, or any of the more common statistical intervals ... since for starters it's based on known population values, not on a sample.

ii) You already calculated the limits of an interval that includes 95% of the probability; it's $(a,b)$, not $(\mu-a,\mu+b)$.

do we mean that 95% of the values lie within the mean of the random variable

No. The mean is a single value. How can 95% of a continuous distribution lie "within" a single value?

But now we don't have that an approximate 95% confidence interval is [mean−2∗SD,mean+2∗SD] since the pdf of Y is not symmetric.

Just because the density isn't symmetric doesn't of itself mean that a symmetric interval can't include 95% of the probability.

It doesn't include 95%, as it happens, though it's often fairly close to 95% for unimodal distributions. However, while it works pretty well for $\pm 2\sigma$, that doesn't always carry over nearly as well to other numbers of sds not close to 2.

So, is the mean±SD property for a confidence interval only valid for normal random variables?

(Again, keeping in mind that it's not a confidence interval)

Well, actually, for normal random variables, 95% of the distribution is within 1.96 sd's of the mean and 95.4% is within 2 sd's of the mean.

Those numbers are calculated from the normal distribution function; $\Phi(1.96)-\Phi(-1.96)=0.9500$ and $\Phi(2)-\Phi(-2)=0.9545$.

Related Question