Solved – Can anyone explain quantile maximum probability estimation (QMPE)

distributionsfittingquantiles

Heathcote, Brown & Mewhort (2002, PDF) present Quantile Maximum Probability Estimation (originally termed Quantile Maximum Likelihood Estimation but later corrected) as a method of fitting distributional data, and find that it outperforms the more traditional Continuous Maximum Likelihood Estimation approach, at least in the case of fitting data to the ex-Gaussian (though see also this paper, which shows that this benefit generalizes to other distributions as well).

I'm trying to understand the actual steps to achieving QMPE. I understand that one first specifies increasing and equidistant quantile probabilities, then uses these to obtain the quantile values (q) in the observed data corresponding to these probabilities. I also understand that these observed quantile values are then used in counting the number of observations between each quantile (N). But this is where I'm stuck. Presumably one searches through the parameter space of whatever a priori model one assumes generated the data, searching for a parameter set that maximizes the joint probability of q and N. However, I don't know how, given a set of candidate parameters, this joint probability is computed.

Not having a strong math background, I think much better in code, so if someone could help me figure out what comes next, I'd be greatly appreciative. Here's the beginning of an attempt to fit some data to an ex-Gaussian:

    #generate some data to fit
    true_mu = 300
    true_sigma = 50
    true_tau = 100
    my_data = rnorm(100, true_mu, 
       true_sigma) + rexp(100, 1/true_tau)
    
    #select some quantile probabilities; 
#estimate quantiles and inter-quantile 
#counts 
#from the observed data
    quantile_probs = seq(.1, .9, .1) 
#or does it have to be seq(0,1,.1) ?
    q = quantile( my_data, probs = 
 quantile_probs, type = 5 ) #Heathcote et al 
#apparently use type=5 given their example
    N = rep( NA , length(q)-1 )
    for( i in 1:( length(q)-1 ) ){
        N = length( my_data[ (my_data>q[i]) 
& (my_data<=q[i+1]) ] )
    }
    
    #specify some candidate parameter values 
#to assess (normally done as part of an 
#iterative search  using an optimizer like 
#optim)
    candidate_mu = 350
    candidate_sigma = 25
    candidate_tau = 30
    #given these candidates, what next?

Best Answer

I'm late to the party but it seems this question still needs an answer, and, surprising to me, I don't see any other questions on this topic. So I'll go ahead and offer this.

To summarize, the QMPE approach is to formulate a likelihood function in terms of the quantiles of the observed data and not the observed data themselves. Then parameters are sought which maximize the likelihood, by differentiating the log-likelihood wrt the parameters and applying a standard optimization method.

It appears that the authors are proposing to ignore the exactly values of the data and only work with quantiles. I'm not sure I agree with that, but in any event, the likelihood function as they state it is correct if one only has quantiles at hand, which is sometimes the case.

The likelihood function can be derived by considering the data failing between quantiles as being interval-censored; you know how many data are in each interval, but you don't know where they are. (That isn't how it's motivated by Heathcote et al., but that's what makes the most sense to me.) For each inter-quantile interval, there is a term

(P(datum in interval | parameters))^(number of data in interval)

Now the number of data in each interval is just n[i] = N (p[i] - p[i - 1]) where N is the total number of data and p[i] is the proportional corresponding to the i-th quantile q[i], i.e. # { data <= q[i] } / N = p[i], and P(datum in interval | parameters) is just cdf(q[i] | parameters) - cdf(q[i - 1] | parameters) where cdf is the cumulative distribution function of the assumed distribution. Then the log-likelihood is the summation of

N (p[i] - p[i - 1]) log(cdf(q[i] | parameters) - cdf(q[i - 1] | parameters))

over the inter-quantile intervals indexed by i. This is the same for any particular distribution; just plug in the appropriate cdf.

To find the parameters which maximize the likelihood, given a list of quantiles, any method for maximizing a function can be employed. If cdf is differentiable, it's probably most convenient to make use of the gradient wrt the parameters. I don't know if anything can be said about the derivatives of cdf wrt the parameters in general, although maybe if some family is assumed, such as location-scale distributions, some general results can be worked out.

The factor N is constant wrt the parameters so it's convenient to omit it. It's often more convenient, when working with optimization software, to search for a minimum instead of a maximum, so one can look at the negative log-likelihood instead of the log-likelihood, and minimize that instead of maximizing.

About maximizing wrt the parameters, Heathcote et al. make use of a particular line-search method, but that is not essential; given the NLL and its gradient, many optimization algorithms are applicable.

To get a starting place for the parameter search, I think it would be convenient to compute the ordinary complete-data maximum likelihood estimates, assuming that all n[i] data in the i-th inter-quantile interval fall at the middle of the interval; such an estimate could be conveniently computed if the function to compute ML estimates allows for weighted data, since then the data are the interval midpoints with weights equal to the mass in each interval. Heathcote et al. find a starting point by estimating parameters by matching moments.

It has been interesting reading about this topic. Hope this helps someone else understand what's going on.

Related Solutions

Solved – Is this a reasonable approach to fitting distributions

One problematic feature is that there may be a continuum of optimal solutions. In most settings the quantiles are continuous functions of the parameters. When the distributions are continuous, almost surely there will be positive intervals between the data values. Suppose your objective function is optimized by a particular parameter value whose quantiles do not coincide exactly with any of the data: that is, they lie in the interiors of the intervals determined by the nearby data values. (This is an extremely likely event.) Then small changes in the parameter value will move the quantiles slightly, to remain within the same intervals, thereby leaving the chi-squared value unchanged because none of the counts changes. Thus the procedure doesn't even pick out a definite set of parameter values!

Another problematic feature is that this procedure apparently provides no way to obtain estimation errors for the parameters.

Another problem is that you do not know even the most basic properties of this estimator, such as its amount of bias.

Solved – Formal definition of the qqline used in a Q-Q plot

Sort of "both" - the line depends both on the observed quantiles (which define the y-axis of the QQ plot) and the expected/theoretical/reference quantiles (which the define the x-axis). The documentation (which you quote) should always be taken as the canonical reference:

‘qqline’ adds a line to a “theoretical”, by default normal, quantile-quantile plot which passes through the ‘probs’ quantiles, by default the first and third quartiles.

If in doubt, USTL ("Use the Source, Luke") , which can be found here: here's a slightly abridged and commented version

 ## quantiles (.25 and 0.75 by default) of data
 y <- quantile(y, probs, names=FALSE, type=qtype, na.rm = TRUE)
 ## quantiles of reference/theoretical distribution
 x <- distribution(probs)
 ## ...
 slope <- diff(y)/diff(x)  ## observed slope between quantiles
 int <- y[1L]-slope*x[1L]  ## intercept
 abline(int, slope, ...)   ## draw the line

For what it's worth, I believe that this approach (line connecting central quantiles) is used because it fulfills the following criteria for exploratory/diagnostic approaches:

quick (e.g. no need to run a linear regression, just find the quantiles and draw a straight line)
robust (it only depends on the behavior of the central part of the distribution, won't be thrown off by weird tails)

Best Answer

Related Solutions

Solved – Is this a reasonable approach to fitting distributions

Solved – Formal definition of the qqline used in a Q-Q plot

Related Question