I have a data set which fits a logNormal distribution quite well. (From a theoretical point of view, it is some hard-to-tackle quotient distribution).
However, the data is quite dirty, so parameter estimation is far from trivial.
Right now, my approach is the following:
- Shift the distribution such that the minimum is almost 0.
- Logspace the data
- Use a robust Median and MAD parameter estimation (see Estimating parameters of a normal distribution: median instead of mean? for details)
The result is significantly better than before (Maximum difference from empirical CDF 0.034 instead of 0.081 and 0.224 without using MAD). It's not perfect in particular on the long tail where I expect outliers. The additional location parameter helped a lot. However, using the minimum is a very crude heuristic. I obviously cannot expect to observe the true minimum, but depending on the sample size the observed minimum will always be some small x larger.
Do you know any robust parameter estimation method (+ a reference if possible) for the $e^{\mathcal{N}(\mu, \sigma)} + c$ distribution family?
Note that e.g. scipy.stats.lognorm does also have such an additional third location parameter, just like the one I'm using, but I'm working in Java with my own code.
Update: I've just come across a thesis on this topic:
- Estimating the Parameters of the Three-Parameter Lognormal Distribution
Rodrigo J. Aristizabal
which includes pointers to some relevant literature, in particular to
- Estimating Parameters of Logarithmic-Normal Distributions by Maximum Likelihood
A. C. Cohen, Jr.
but I find it hard to get a formula out of these publications that I could implement.
Best Answer
In case anyone is still interested, I have managed to implement Aristizabal's formulae in Java. This is more proof-of-concept than the requested "robust" code, but it is a starting point.