Before going in to the code, allow me to give you a quick primer on trust region methods. Let $f(x)$ be your objective function and $x_k$ be your current iterate. Iteration $k$ of a generic trust region method looks something like this:
- Pick a maximum step size, $\Delta_k$
- Build an model of $f(x)$ at $x = x_k$; call it $Q(x)$.
- Find the step, $s_k$, that minimizes $Q_k(x_k + s_k)$ subject to the constraint $||s_k|| \leq \Delta_k$
- If $s_k$ is "good enough", let $x_{k+1} = x_k + s_k$
- Otherwise, refine your model and try again
One of the the ways you determine if $s_k$ is "good enough" is by comparing the decrease predicted by the model to the actual decrease in the objective function. This is what the portion of the code after 430 does. However, before that, there's a quick sanity check to see if the model predicts a decrease AT ALL. And that's what's happening at 430.
To understand the value of VQUAD
, we first have to understand a few other variables. Fortunately, there are good comments right below the declaration of SUBROUTINE BOBYQB
. The salient variables are:
GOPT
, the gradient of the model
HQ
, the Hessian of the model
D
, the trial step (I called this $s_k$ above)
Beginning a few lines above 410, you'll see DO 410 J=1,N
. This begins a for-loop (and a nested for-loop) that evaluates the change predicted by the model using trial step D
. It accumulates the predicted change in VQUAD
. The first part of the for-loop evaluates the first-order terms and the nested for-loop evaluates the second-order terms. It would probably be easier to read if the loops were indented, like this:
DO 410 J=1,N
VQUAD=VQUAD+D(J)*GOPT(J)
DO 410 I=1,J
IH=IH+1
TEMP=D(I)*D(J)
IF (I .EQ. J) TEMP=HALF*TEMP
410 VQUAD=VQUAD+HQ(IH)*TEMP
There's another for-loop after this to incorporate other parameters in to the model. I have to admit, I don't fully understand this - my best guess is that it's particular to how they build the model.
At the end of all this, VQUAD
holds the change in objective function predicted by the model. So if VQUAD
is non-negative, that's bad. Now this particular solver can use an alternative step computation (probably a line search), which is where NTRITS
comes in to play. So the logic at 430 is saying, "If the last iteration used the alternative step computation AND the model does not predict a decrease AND IPRINT
> 0, print the warning message." Note that the solver is going to terminate regardless of the value of IPRINT
.
Speaking of IPRINT
, that value is passed to BOBYQA
by the calling function. In this case, your R routine is the calling function. There's a verbose
parameter to glmer
- I would be dimes to dollars that same value is passed to BOBYQA
. Try setting verbose
to 0 and you probably won't see the warning. But it won't change what's going on under the hood, of course.
Zero-inflation is about the shape of the distribution. Therefore, you will have to specify the distribution for the non-zero part (Poisson, Negative Binomial, etc), if you want a formal test. Then you can use a likelihood ratio test to see if the zero-inflated parameters can be dropped from the model. This can be done in R.
In cruder terms, zero inflation is defined not only by proportion of zeros but also by the total number of observations. Say, if you assume a zero-inflated Poisson model and your data contain 50% of zeros, you still won't be able to say with certainty that it's zero inflated if the total number of points is only 4. On the other hand, 10% of zeros in 1000 observations can result in a positive test for zero-inflation.
Zero-inflated property is associated with count-based data, so I haven't heard of "zero-inflated normal". E.g. in this package:
cran.r-project.org/web/packages/pscl/pscl.pdf
they only consider Poisson, Negative Binomial and Geometric. What I would do is fit, say, a Poisson model where the zero and non-zero components contain only the intercept and then check if the intercept from the zero component has a significant p-value.
P.S. I also managed to find a reference to the (log) normal zero-inflated distribution, but I don't know if it's obtainable for free:
"Analysis of repeated measures data with clumping at zero", Stat Methods Med Res, August 2002
http://smm.sagepub.com/content/11/4/341.full.pdf+html
Best Answer
There are a number of things to consider:
While there exist algorithms that are able to find global extrema (simulated annealing, various genetic algorithms) they can take an inordinately long time for very little return.
If you have analytic gradients or Hessians, it makes quasi-Newton methods much more appealing. If you have Hessians you can use trust methods, although with some exceptions (e.g. R
trustOptim
) these tend to bog down with large problems as the Hessian grows rather quickly. If you have gradients, you can use BFGS-type methods. If you don't, you can either use gradient-free methods like Nelder-Mead, Subplex (my favorite), COBYLA, BOBYQA, etc. Or you can use the quasi-Newton methods with a finite-differecer to estimate the gradient (as is default in Roptim
), but only if your objective function is differentiable. If you are looking at a least-squares issue specifically, Levernberg-Marquardt is a possibility.I believe it does boil down to the specifics of your problem, although I've personally found most success with the Subplex-type algorithms (multi-start Nelder-Mead).