[Math] How to understand a ‘shifted’ lognormal distribution random variable (RV) and its results

computational mathematicsstatistics

This is an applied math question.

I am doing some numerical work, in Python, using Scipy.stats. But it is really the underlying math that matters, and I am questioning the results/implementation. It is really the math that counts.

The general problem is I am using lognormal (LN) RVs to obtain multiplicative results through iteration. So, for example, I have a starting 'known' LN RV which is sort of like a Dirac-delta function: if $Y=e^X$, where Y is lognormal and X is thus normal. To be clear, Y has both a mean and an SD (standard deviation) which can be calculated/observed empirically. Underylying it is a normal distribution for X with parameters $\mu$ and $\sigma$, which can be derived from Y's mean and SD (see: https://en.wikipedia.org/wiki/Log-normal_distribution ).

Since it is lognormal, I can multiply it by another LN distribution to get a new lognormal distribution. in practice – if we call the parameters of the first distribution $mu_1$ and $sigma_1$, and those of the second $mu_2$ and $sigma_2$ , we can calculate the $X$ representation as:

$$\mu – \mu_1 + \mu_2$$
$$\sigma = (\sigma_1^2 + \sigma_2^2)^{0.5}$$

assuming, of course, independence.

All works well. But Python offers an additional parameter 'offset', which shifts the lognormal left or right by the fixed amount. Thus, if you have a wrapper around the Scipy calls that creates an object RV=Lognorm(100000, 10000, -50000) the pdf delivered does, indeed, have an SD = 10,000, but centered at 50,000 (since the 100,000 offset is offset by -50,000).

What I struggle with is this.

If you, in fact, ask the package fro the mean and SD for RV, it gives mean=50,000 and SD = 10,000. Thus, it looks like it creates an RV that is not totally shifted left by 50,000 – which would possibly allow positive probability of values less than zero – but that it adjusts the mean downward by 50,000.

It looks to me like a bit of a software kludge that works. To my way of thinking, shifting to the left by 'n' units could/should preserve all central moments, but(1) will allow negative values and (2) there should not exist a proper, 2-parameter lognormal that gives the same pdf – i.e., pdf(100000, 10000) shifted left 50000 is not pdf(50000, 10000) since the pdf has, in its definition, $e^{ln(x)}$ and the shift should appear as $e^{ln(x)-s}$, where $s$ is the shift amount.

Am I missing something here, or is this just a convention of Python which does not conform to the actual definition of the lognorm distribution? Or am I wrong on the definition/understanding of a three-parameter lognormal distribution?

Best Answer

After much experimentation (and a few bits of help from others), what I determined (or at least it appears) that when Scipy does a trivariate lognormal, it really gives you the lognormal with $mean-offset$ as the actual mean, and an unchanged standard deviation of the Y distribution. Thus, the returned pdf is a lognormal itself.

Related Question