Solved – Robust parameter estimation for shifted log normal distribution

distributionsestimationlognormal distribution

I have a data set which fits a logNormal distribution quite well. (From a theoretical point of view, it is some hard-to-tackle quotient distribution).

However, the data is quite dirty, so parameter estimation is far from trivial.

Right now, my approach is the following:

Shift the distribution such that the minimum is almost 0.
Logspace the data
Use a robust Median and MAD parameter estimation (see Estimating parameters of a normal distribution: median instead of mean? for details)

The result is significantly better than before (Maximum difference from empirical CDF 0.034 instead of 0.081 and 0.224 without using MAD). It's not perfect in particular on the long tail where I expect outliers. The additional location parameter helped a lot. However, using the minimum is a very crude heuristic. I obviously cannot expect to observe the true minimum, but depending on the sample size the observed minimum will always be some small x larger.

Do you know any robust parameter estimation method (+ a reference if possible) for the $e^{\mathcal{N}(\mu, \sigma)} + c$ distribution family?

Note that e.g. scipy.stats.lognorm does also have such an additional third location parameter, just like the one I'm using, but I'm working in Java with my own code.

Update: I've just come across a thesis on this topic:

Estimating the Parameters of the Three-Parameter Lognormal Distribution
Rodrigo J. Aristizabal

which includes pointers to some relevant literature, in particular to

Estimating Parameters of Logarithmic-Normal Distributions by Maximum Likelihood
A. C. Cohen, Jr.

but I find it hard to get a formula out of these publications that I could implement.

Best Answer

In case anyone is still interested, I have managed to implement Aristizabal's formulae in Java. This is more proof-of-concept than the requested "robust" code, but it is a starting point.

/**
 * Computes the point estimate of the shift offset (gamma) from the given sample. The sample array will be sorted by this method.<p>
 * Cf. Aristizabal section 2.2 ff.
 * @param sample {@code double[]}, will be sorted
 * @return gamma point estimate
 */
public static double pointEstimateOfGammaFromSample(double[] sample) {
    Arrays.sort(sample);
    DoubleUnaryOperator func = x->calculatePivotalOfSortedSample(sample, x)-1.0;
    double upperLimit = sample[0];
    double lowerLimit = 0;
    double gamma = bisect(func, lowerLimit, upperLimit);
    return gamma;
}

/**
 * Cf. Aristizabal's equation (2.3.1)
 * @param sample {@code double[]}, should be sorted in ascending order
 * @param gamma shift offset
 * @return pivotal value of sample
 */
private static double calculatePivotalOfSortedSample(final double[] sample, double gamma) {
    final int n=sample.length;
    final int n3=n/3;
    final double mid = avg(sample, gamma, n3+1, n-n3);
    final double low = avg(sample, gamma, 1, n3);
    final double upp = avg(sample, gamma, n-n3+1, n);
    final double result = (mid-low)/(upp-mid);
    return result;
}

/**
 * Computes average of sample values from {@code sample[l-1]} to {@code sample[u-1]}.
 * @param sample {@code double[]}, should be sorted in ascending order
 * @param gamma shift offset
 * @param l lower limit
 * @param u upper limit
 * @return average
 */
private static double avg(double[] sample, double gamma, int l, int u) {
    double sum = 0.0;
    for (int i=l-1;i<u;sum+=Math.log(sample[i++]-gamma));
    final int n = u-l+1;
    return sum/n;
}

/**
 * Naive bisection implementation. Should always complete if the given values actually straddles the root.
 * Will call {@link #secant(DoubleUnaryOperator, double, double)} if they do not, in which case the
 * call may not complete.
 * @param func Function solve for root value
 * @param lowerLimit Some value for which the given function evaluates < 0
 * @param upperLimit Some value for which the given function evaluates > 0
 * @return x value, somewhere between the lower and upper limits, which evaluates close enough to zero
 */
private static double bisect(DoubleUnaryOperator func, double lowerLimit, double upperLimit) {
    final double eps = 0.000001;
    double low=lowerLimit;
    double valAtLow = func.applyAsDouble(low);
    double upp=upperLimit;
    double valAtUpp = func.applyAsDouble(upp);
    if (valAtLow*valAtLow>0) {
        // Switch to secant method
        return secant(func, lowerLimit, upperLimit);
    }
    System.out.printf("bisect %f@%f -- %f@%f%n", valAtLow, low, valAtUpp, upp);
    double mid;
    while(true) {
        mid = (upp+low)/2;
        if (Math.abs(upp-low)/low<eps)
            break;
        double val = func.applyAsDouble(mid);
        if (Math.abs(val)<eps)
            break;
        if (val<0)
            low=mid;
        else
            upp=mid;
    }
    return mid;
}

/**
 * Naive secant root solver implementation. May not complete if root not found.
 * @param f Function solve for root value
 * @param a Some value for which the given function evaluates
 * @param b Some value for which the given function evaluates
 * @return x value which evaluates close enough to zero
 */
static double secant(final DoubleUnaryOperator f, double a, double b) {
    double fa = f.applyAsDouble(a);
    if (fa==0)
        return a;
    double fb = f.applyAsDouble(b);
    if (fb==0)
        return b;
    System.out.printf("secant %f@%f -- %f@%f%n", fa, a, fb, b);
    if (fa*fb<0) {
        return bisect(f, a, b);
    }
    while ( abs(b-a) > abs(0.00001*a) ) {
          final double m = (a+b)/2;
          final double k = (fb-fa)/(b-a);
          final double fm = f.applyAsDouble(m);
          final double x = m-fm/k;
          if (Math.abs(fa)<Math.abs(fb)) {
              // f(a)<f(b); Choose x and a
              b=x;
              fb=f.applyAsDouble(b);
          } else {
              // f(a)>=f(b); Choose x and b
              a=x;
              fa=f.applyAsDouble(a);
          }
          if (fa==0)
              return a;
          if (fb==0)
              return b;
          if (fa*fb<0) {
              // Straddling root; switch to bisect method
              return bisect(f, a, b);
          }
      }
    return (a+b)/2;

}

Related Solutions

Solved – Variance-gamma distribution: parameter estimation

1. In pp. 16 they mention that the package uses "BFGS" or "Nelder-Mead". The second one is the default option. Please, take a look at these links to see their differences. Optimisation is actually made on the likelihood function. This is, the fitted parameters in the output of vgFit are actually the maximum likelihood estimators.

2. It is difficult to tell which optimisation method is better in general. You can instead compare different methods and see if the results coincide using the command optim and the command vgFit. Next, I present a code for maximising the likelihood function, you can choose between 6 different optimisation methods.

library("VarianceGamma")

# Simulate 100 observations from a variance-gamma distribution with parameters (0,1,0,1)
data = rvg(100, vgC = 0, sigma = 1, theta = 0, nu = 1)

# -log-likelihood function using the Variance-Gamma package
ll = function(par){
if(par[2]>0&par[4]>0) return( - sum(log(dvg(data, vgC = par[1], sigma=par[2],   
   theta=par[3], nu = par[4]) )))
else return(Inf)}

# Direct maximisation/minimisation using the command optim
optim(c(0,1,0,1),ll,method = c("Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "SANN", "Brent"))

# Maximisation using the command vgFit
vgFit(data)

The advantage of vgFit is that you do not need to specify the starting value for searching the optimum. It implements three different methods for doing so: "US", "SL" and "MoM". "SL" is the default method. I would not trust blindly on these methods, I would rather compare the results from vgFit and optim.

3. You can check the references in the manual. For example

Seneta, E. (2004). Fitting the variance-gamma model to financial data. J. Appl. Prob., 41A:177– 187.

Kotz, S, Kozubowski, T. J., and Podgórski, K. (2001). The Laplace Distribution and Generalizations. Birkhauser, Boston, 349 p.

D.B. Madan and E. Seneta (1990): The variance gamma (V.G.) model for share market returns, Journal of Business, 63, pp. 511–524.

Solved – Posterior distribution for multinomial parameter

Unfortunately, the data is a bit difficult to deal with, since it consists of mostly "soft evidence", so the parameter estimation doesn't seem to have an easy analytic solution such as a direct update of Dirichlet counts.

— why does that follow? Why not just scale the counts based on the amount of evidence supporting each one?

If you have samples of the posterior distribution, why can't you use the sufficient statistics to turn those samples into a maximum likelihood distribution?

Update after your recent edits:

The likelihood on $\theta$ given $o$ is true has density

\begin{align} T_1(x) \propto 0.9 x + 0.1(1-x), \end{align}

and false has density, say

\begin{align} T_2(x) \propto 0.2 x + 0.8(1-x). \end{align}

Then, the final density after $\eta_1$ observations of true and $\eta_2$ observations of false is proportional to

\begin{align} T_1(x)^{\eta_1}T_2(x)^{\eta_2} \end{align}

which is nevertheless an exponential family, although not a Beta distribution as you rightly point out.

Best Answer

Related Solutions

Solved – Variance-gamma distribution: parameter estimation

Solved – Posterior distribution for multinomial parameter

Related Question