Solved – Can anyone explain quantile maximum probability estimation (QMPE)

distributionsfittingquantiles

Heathcote, Brown & Mewhort (2002, PDF) present Quantile Maximum Probability Estimation (originally termed Quantile Maximum Likelihood Estimation but later corrected) as a method of fitting distributional data, and find that it outperforms the more traditional Continuous Maximum Likelihood Estimation approach, at least in the case of fitting data to the ex-Gaussian (though see also this paper, which shows that this benefit generalizes to other distributions as well).

I'm trying to understand the actual steps to achieving QMPE. I understand that one first specifies increasing and equidistant quantile probabilities, then uses these to obtain the quantile values (q) in the observed data corresponding to these probabilities. I also understand that these observed quantile values are then used in counting the number of observations between each quantile (N). But this is where I'm stuck. Presumably one searches through the parameter space of whatever a priori model one assumes generated the data, searching for a parameter set that maximizes the joint probability of q and N. However, I don't know how, given a set of candidate parameters, this joint probability is computed.

Not having a strong math background, I think much better in code, so if someone could help me figure out what comes next, I'd be greatly appreciative. Here's the beginning of an attempt to fit some data to an ex-Gaussian:

    #generate some data to fit
    true_mu = 300
    true_sigma = 50
    true_tau = 100
    my_data = rnorm(100, true_mu, 
       true_sigma) + rexp(100, 1/true_tau)
    
    #select some quantile probabilities; 
#estimate quantiles and inter-quantile 
#counts 
#from the observed data
    quantile_probs = seq(.1, .9, .1) 
#or does it have to be seq(0,1,.1) ?
    q = quantile( my_data, probs = 
 quantile_probs, type = 5 ) #Heathcote et al 
#apparently use type=5 given their example
    N = rep( NA , length(q)-1 )
    for( i in 1:( length(q)-1 ) ){
        N = length( my_data[ (my_data>q[i]) 
& (my_data<=q[i+1]) ] )
    }
    
    #specify some candidate parameter values 
#to assess (normally done as part of an 
#iterative search  using an optimizer like 
#optim)
    candidate_mu = 350
    candidate_sigma = 25
    candidate_tau = 30
    #given these candidates, what next?

Best Answer

I'm late to the party but it seems this question still needs an answer, and, surprising to me, I don't see any other questions on this topic. So I'll go ahead and offer this.

To summarize, the QMPE approach is to formulate a likelihood function in terms of the quantiles of the observed data and not the observed data themselves. Then parameters are sought which maximize the likelihood, by differentiating the log-likelihood wrt the parameters and applying a standard optimization method.

It appears that the authors are proposing to ignore the exactly values of the data and only work with quantiles. I'm not sure I agree with that, but in any event, the likelihood function as they state it is correct if one only has quantiles at hand, which is sometimes the case.

The likelihood function can be derived by considering the data failing between quantiles as being interval-censored; you know how many data are in each interval, but you don't know where they are. (That isn't how it's motivated by Heathcote et al., but that's what makes the most sense to me.) For each inter-quantile interval, there is a term

(P(datum in interval | parameters))^(number of data in interval)

Now the number of data in each interval is just n[i] = N (p[i] - p[i - 1]) where N is the total number of data and p[i] is the proportional corresponding to the i-th quantile q[i], i.e. # { data <= q[i] } / N = p[i], and P(datum in interval | parameters) is just cdf(q[i] | parameters) - cdf(q[i - 1] | parameters) where cdf is the cumulative distribution function of the assumed distribution. Then the log-likelihood is the summation of

N (p[i] - p[i - 1]) log(cdf(q[i] | parameters) - cdf(q[i - 1] | parameters))

over the inter-quantile intervals indexed by i. This is the same for any particular distribution; just plug in the appropriate cdf.

To find the parameters which maximize the likelihood, given a list of quantiles, any method for maximizing a function can be employed. If cdf is differentiable, it's probably most convenient to make use of the gradient wrt the parameters. I don't know if anything can be said about the derivatives of cdf wrt the parameters in general, although maybe if some family is assumed, such as location-scale distributions, some general results can be worked out.

The factor N is constant wrt the parameters so it's convenient to omit it. It's often more convenient, when working with optimization software, to search for a minimum instead of a maximum, so one can look at the negative log-likelihood instead of the log-likelihood, and minimize that instead of maximizing.

About maximizing wrt the parameters, Heathcote et al. make use of a particular line-search method, but that is not essential; given the NLL and its gradient, many optimization algorithms are applicable.

To get a starting place for the parameter search, I think it would be convenient to compute the ordinary complete-data maximum likelihood estimates, assuming that all n[i] data in the i-th inter-quantile interval fall at the middle of the interval; such an estimate could be conveniently computed if the function to compute ML estimates allows for weighted data, since then the data are the interval midpoints with weights equal to the mass in each interval. Heathcote et al. find a starting point by estimating parameters by matching moments.

It has been interesting reading about this topic. Hope this helps someone else understand what's going on.

Related Question