Solved – Is this a reasonable approach to fitting distributions

distributionsfitting

Take the task of fitting an a priori distribution like the ex-Gaussian
to a collection of observed human response times (RT). One method is to compute the sum log likelihood of each observed RT given a set of candidate ex-Gaussian parameters, then try to find the set of parameters that maximizes this sum log likelihood. I wonder if this alternative approach might also be reasonable:

  1. Select a set of equidistant quantile probabilities, e.g.:

    qps = seq( .1 , .9 , .1 )
    
  2. For a given set of candidate ex-Gaussian parameters, estimate the
    quantile RT values corresponding to qps, e.g.:

    sim_dat = rnorm( 1e5 , mu , sigma ) + rexp( 1e5 , 1/tau )
    qrt = quantile( sim_dat , prob = qps )
    
  3. For each sequential interval between the thus-generated quantile RT
    values, count the number of observations falling into that interval,
    e.g.:

    obs_counts = rep( NA , length(qrt)-1 )
    for( i in 1:(length(qrt)-1) ){
        obs_counts[i] = length( obs_rt[ (obs_rt>qrt[i]) & (obs_rt<=qrt[i+1]) ] )
    }
    
  4. Compare these observed counts to the expected counts:

    exp_counts = diff(range(qps)) * diff(qps)[1] * length(obs_rt)
    chi_sq = sum( (( obs_counts - exp_counts )^2 )/exp_counts )
    
  5. Repeat steps 2-4, searching for candidate parameter values that
    minimize chi_sq.

Is this approach a reasonable alternative to the more standard maximum likelihood estimation procedure? Does this approach already have a name?

Note that I use the example of an ex-Gaussian purely for illustrative purposes; in practice I'm playing with using the above approach in a rather more complicated context (e.g. fitting data to a stochastic model that yields multiple distributions, each with a different number of expected observation count). The purpose of this question is to ascertain whether I've re-invented the wheel as well as if anyone can pick out any problematic features of the method.

Best Answer

One problematic feature is that there may be a continuum of optimal solutions. In most settings the quantiles are continuous functions of the parameters. When the distributions are continuous, almost surely there will be positive intervals between the data values. Suppose your objective function is optimized by a particular parameter value whose quantiles do not coincide exactly with any of the data: that is, they lie in the interiors of the intervals determined by the nearby data values. (This is an extremely likely event.) Then small changes in the parameter value will move the quantiles slightly, to remain within the same intervals, thereby leaving the chi-squared value unchanged because none of the counts changes. Thus the procedure doesn't even pick out a definite set of parameter values!

Another problematic feature is that this procedure apparently provides no way to obtain estimation errors for the parameters.

Another problem is that you do not know even the most basic properties of this estimator, such as its amount of bias.

Related Question