Solved – Selecting optimization algorithms

optimizationr

I am in the process of developing a multi-state model (using the msm package in R), and have been encountering the error message

initial value in 'vmmin' is not finite

This error occurs regardless of whether or not I scale the parameters or not, and for almost every optimization algorithm except the bobyqa algorithm (from the minqa package). Basically I went through a process of experimentation until something worked, but that does not necessarily mean that the choices I made are methodologically sound. I've tried to read up on this topic, but every resource I find immediately launches into heavy math jargon.

My question is – how does one go about choosing an optimization algorithm? What do I need to take into consideration when making the selection, and to what extent does the selection of a optimization algorithm affect how I interpret the results of the function using the algorithm? Also, I would be grateful if someone could recommend a resource explaining optimization that scaffolds well.

Best Answer

There are a number of things to consider:

Global vs. Local
Do you have analytic gradients or Hessians
Is the function even differentiable
The size of the problem/data set
How much time do you have
Are there special features of the problem for which a particular algorithm may be better

While there exist algorithms that are able to find global extrema (simulated annealing, various genetic algorithms) they can take an inordinately long time for very little return.

If you have analytic gradients or Hessians, it makes quasi-Newton methods much more appealing. If you have Hessians you can use trust methods, although with some exceptions (e.g. R trustOptim) these tend to bog down with large problems as the Hessian grows rather quickly. If you have gradients, you can use BFGS-type methods. If you don't, you can either use gradient-free methods like Nelder-Mead, Subplex (my favorite), COBYLA, BOBYQA, etc. Or you can use the quasi-Newton methods with a finite-differecer to estimate the gradient (as is default in R optim), but only if your objective function is differentiable. If you are looking at a least-squares issue specifically, Levernberg-Marquardt is a possibility.

I believe it does boil down to the specifics of your problem, although I've personally found most success with the Subplex-type algorithms (multi-start Nelder-Mead).

Related Solutions

Solved – Meaning of a convergence warning in glmer

Before going in to the code, allow me to give you a quick primer on trust region methods. Let $f(x)$ be your objective function and $x_k$ be your current iterate. Iteration $k$ of a generic trust region method looks something like this:

Pick a maximum step size, $\Delta_k$
Build an model of $f(x)$ at $x = x_k$; call it $Q(x)$.
Find the step, $s_k$, that minimizes $Q_k(x_k + s_k)$ subject to the constraint $||s_k|| \leq \Delta_k$
If $s_k$ is "good enough", let $x_{k+1} = x_k + s_k$
Otherwise, refine your model and try again

One of the the ways you determine if $s_k$ is "good enough" is by comparing the decrease predicted by the model to the actual decrease in the objective function. This is what the portion of the code after 430 does. However, before that, there's a quick sanity check to see if the model predicts a decrease AT ALL. And that's what's happening at 430.

To understand the value of VQUAD, we first have to understand a few other variables. Fortunately, there are good comments right below the declaration of SUBROUTINE BOBYQB. The salient variables are:

GOPT, the gradient of the model
HQ, the Hessian of the model
D, the trial step (I called this $s_k$ above)

Beginning a few lines above 410, you'll see DO 410 J=1,N. This begins a for-loop (and a nested for-loop) that evaluates the change predicted by the model using trial step D. It accumulates the predicted change in VQUAD. The first part of the for-loop evaluates the first-order terms and the nested for-loop evaluates the second-order terms. It would probably be easier to read if the loops were indented, like this:

    DO 410 J=1,N
        VQUAD=VQUAD+D(J)*GOPT(J)
        DO 410 I=1,J
            IH=IH+1
            TEMP=D(I)*D(J)
            IF (I .EQ. J) TEMP=HALF*TEMP
410         VQUAD=VQUAD+HQ(IH)*TEMP

There's another for-loop after this to incorporate other parameters in to the model. I have to admit, I don't fully understand this - my best guess is that it's particular to how they build the model.

At the end of all this, VQUAD holds the change in objective function predicted by the model. So if VQUAD is non-negative, that's bad. Now this particular solver can use an alternative step computation (probably a line search), which is where NTRITS comes in to play. So the logic at 430 is saying, "If the last iteration used the alternative step computation AND the model does not predict a decrease AND IPRINT > 0, print the warning message." Note that the solver is going to terminate regardless of the value of IPRINT.

Speaking of IPRINT, that value is passed to BOBYQA by the calling function. In this case, your R routine is the calling function. There's a verbose parameter to glmer - I would be dimes to dollars that same value is passed to BOBYQA. Try setting verbose to 0 and you probably won't see the warning. But it won't change what's going on under the hood, of course.

Solved – Are the data zero inflated

Zero-inflation is about the shape of the distribution. Therefore, you will have to specify the distribution for the non-zero part (Poisson, Negative Binomial, etc), if you want a formal test. Then you can use a likelihood ratio test to see if the zero-inflated parameters can be dropped from the model. This can be done in R.

In cruder terms, zero inflation is defined not only by proportion of zeros but also by the total number of observations. Say, if you assume a zero-inflated Poisson model and your data contain 50% of zeros, you still won't be able to say with certainty that it's zero inflated if the total number of points is only 4. On the other hand, 10% of zeros in 1000 observations can result in a positive test for zero-inflation.

Zero-inflated property is associated with count-based data, so I haven't heard of "zero-inflated normal". E.g. in this package:

cran.r-project.org/web/packages/pscl/pscl.pdf

they only consider Poisson, Negative Binomial and Geometric. What I would do is fit, say, a Poisson model where the zero and non-zero components contain only the intercept and then check if the intercept from the zero component has a significant p-value.

P.S. I also managed to find a reference to the (log) normal zero-inflated distribution, but I don't know if it's obtainable for free:

"Analysis of repeated measures data with clumping at zero", Stat Methods Med Res, August 2002

http://smm.sagepub.com/content/11/4/341.full.pdf+html

Best Answer

Related Solutions

Solved – Meaning of a convergence warning in glmer

Solved – Are the data zero inflated

Related Question