Solved – Robust parameter estimation for shifted log normal distribution

distributionsestimationlognormal distribution

I have a data set which fits a logNormal distribution quite well. (From a theoretical point of view, it is some hard-to-tackle quotient distribution).

However, the data is quite dirty, so parameter estimation is far from trivial.

Right now, my approach is the following:

  1. Shift the distribution such that the minimum is almost 0.
  2. Logspace the data
  3. Use a robust Median and MAD parameter estimation (see Estimating parameters of a normal distribution: median instead of mean? for details)

The result is significantly better than before (Maximum difference from empirical CDF 0.034 instead of 0.081 and 0.224 without using MAD). It's not perfect in particular on the long tail where I expect outliers. The additional location parameter helped a lot. However, using the minimum is a very crude heuristic. I obviously cannot expect to observe the true minimum, but depending on the sample size the observed minimum will always be some small x larger.

Do you know any robust parameter estimation method (+ a reference if possible) for the $e^{\mathcal{N}(\mu, \sigma)} + c$ distribution family?

Note that e.g. scipy.stats.lognorm does also have such an additional third location parameter, just like the one I'm using, but I'm working in Java with my own code.

Update: I've just come across a thesis on this topic:

  • Estimating the Parameters of the Three-Parameter Lognormal Distribution
    Rodrigo J. Aristizabal

which includes pointers to some relevant literature, in particular to

  • Estimating Parameters of Logarithmic-Normal Distributions by Maximum Likelihood
    A. C. Cohen, Jr.

but I find it hard to get a formula out of these publications that I could implement.

Best Answer

In case anyone is still interested, I have managed to implement Aristizabal's formulae in Java. This is more proof-of-concept than the requested "robust" code, but it is a starting point.

/**
 * Computes the point estimate of the shift offset (gamma) from the given sample. The sample array will be sorted by this method.<p>
 * Cf. Aristizabal section 2.2 ff.
 * @param sample {@code double[]}, will be sorted
 * @return gamma point estimate
 */
public static double pointEstimateOfGammaFromSample(double[] sample) {
    Arrays.sort(sample);
    DoubleUnaryOperator func = x->calculatePivotalOfSortedSample(sample, x)-1.0;
    double upperLimit = sample[0];
    double lowerLimit = 0;
    double gamma = bisect(func, lowerLimit, upperLimit);
    return gamma;
}

/**
 * Cf. Aristizabal's equation (2.3.1)
 * @param sample {@code double[]}, should be sorted in ascending order
 * @param gamma shift offset
 * @return pivotal value of sample
 */
private static double calculatePivotalOfSortedSample(final double[] sample, double gamma) {
    final int n=sample.length;
    final int n3=n/3;
    final double mid = avg(sample, gamma, n3+1, n-n3);
    final double low = avg(sample, gamma, 1, n3);
    final double upp = avg(sample, gamma, n-n3+1, n);
    final double result = (mid-low)/(upp-mid);
    return result;
}

/**
 * Computes average of sample values from {@code sample[l-1]} to {@code sample[u-1]}.
 * @param sample {@code double[]}, should be sorted in ascending order
 * @param gamma shift offset
 * @param l lower limit
 * @param u upper limit
 * @return average
 */
private static double avg(double[] sample, double gamma, int l, int u) {
    double sum = 0.0;
    for (int i=l-1;i<u;sum+=Math.log(sample[i++]-gamma));
    final int n = u-l+1;
    return sum/n;
}

/**
 * Naive bisection implementation. Should always complete if the given values actually straddles the root.
 * Will call {@link #secant(DoubleUnaryOperator, double, double)} if they do not, in which case the
 * call may not complete.
 * @param func Function solve for root value
 * @param lowerLimit Some value for which the given function evaluates < 0
 * @param upperLimit Some value for which the given function evaluates > 0
 * @return x value, somewhere between the lower and upper limits, which evaluates close enough to zero
 */
private static double bisect(DoubleUnaryOperator func, double lowerLimit, double upperLimit) {
    final double eps = 0.000001;
    double low=lowerLimit;
    double valAtLow = func.applyAsDouble(low);
    double upp=upperLimit;
    double valAtUpp = func.applyAsDouble(upp);
    if (valAtLow*valAtLow>0) {
        // Switch to secant method
        return secant(func, lowerLimit, upperLimit);
    }
    System.out.printf("bisect %f@%f -- %f@%f%n", valAtLow, low, valAtUpp, upp);
    double mid;
    while(true) {
        mid = (upp+low)/2;
        if (Math.abs(upp-low)/low<eps)
            break;
        double val = func.applyAsDouble(mid);
        if (Math.abs(val)<eps)
            break;
        if (val<0)
            low=mid;
        else
            upp=mid;
    }
    return mid;
}

/**
 * Naive secant root solver implementation. May not complete if root not found.
 * @param f Function solve for root value
 * @param a Some value for which the given function evaluates
 * @param b Some value for which the given function evaluates
 * @return x value which evaluates close enough to zero
 */
static double secant(final DoubleUnaryOperator f, double a, double b) {
    double fa = f.applyAsDouble(a);
    if (fa==0)
        return a;
    double fb = f.applyAsDouble(b);
    if (fb==0)
        return b;
    System.out.printf("secant %f@%f -- %f@%f%n", fa, a, fb, b);
    if (fa*fb<0) {
        return bisect(f, a, b);
    }
    while ( abs(b-a) > abs(0.00001*a) ) {
          final double m = (a+b)/2;
          final double k = (fb-fa)/(b-a);
          final double fm = f.applyAsDouble(m);
          final double x = m-fm/k;
          if (Math.abs(fa)<Math.abs(fb)) {
              // f(a)<f(b); Choose x and a
              b=x;
              fb=f.applyAsDouble(b);
          } else {
              // f(a)>=f(b); Choose x and b
              a=x;
              fa=f.applyAsDouble(a);
          }
          if (fa==0)
              return a;
          if (fb==0)
              return b;
          if (fa*fb<0) {
              // Straddling root; switch to bisect method
              return bisect(f, a, b);
          }
      }
    return (a+b)/2;

}