These situations are understood more clearly when having in mind what minimisation or maximisation really is and how optimisation works.
Suppose we have function $f$ which has local minimum at $x_0$. Optimisation methods try to construct the sequence $x_i$ which converges to $x_0$. It is always shown that in theory the sequence constructed converges to the point of local minimum for some class of functions $f$.
To obtain next candidate in iteration $i$ can be a lengthy process, so it is usual that all algorithms limit the number of iterations. This corresponds to situation 4.
Then for each $x$ close to $x_0$ we have that $f(x)>f(x_0)$. So if $f(x_i)>f(x_{i-1})$ this is an indication that we reached the minimum. This corresponds to situation 3
Now if function $f$ has a derivative at $x_0$ then necessarily $\nabla f(x_0)=0$. Newton-Raphson method calculates gradient at each step, so if $\nabla f(x_i)\approx 0$, $x_i$ is probably a solution, which corresponds to situation 1.
Each convergent sequence of real vectors is Cauchy sequence and vice versa, roughly meaning that if $x_i$ is close to $x_0$, then $x_i$ is close to $x_{i+1}$ and vice versa, where $i$ is the iteration number. So if $|x_i-x_{i-1}|<\varepsilon$, and we know that in theory $x_i$ converges to $x_0$, then we should be close to the minimum point. This corresponds to situation 2.
Converging sequences have the property that they contract, i.e. if we are close to convergence all the remaining elements of the sequence are contained in small area. So if the sequence which in theory should converge starts to take large steps this is an indication that there is no convergence probably. This corresponds to situation 5
Note Strict mathematical definitions were left out intentionally.
By reading documentation from the R
geoR
package we can find that the boxcox transform with the extra parameter $\lambda_2$ is defined as
$$
Y' =\begin{cases} \log(Y+\lambda_2) \text{if $\lambda=0$} \\
\frac{(Y+\lambda_2)^\lambda -1}{\lambda} \text{otherwise}
\end{cases}
$$
so if $\lambda_2=0$ this is the usual boxcox transform, and the boxcoxfit
function will estimate the two parameters $\lambda, \lambda_2$ by maximum likelihood. From the examples on the help page it seems like the boxcoxfit function can take as first argument either a vector of data values or a model object, then presumably using the residuals from the fit. From your code it seems like you have just given a data vector, thus finding the boxcox transformation parameters based on the marginal distribution of the response variable. That is usually less useful than using the residuals from the model fit, so you should reanalyze doing that!
You asks then why the $\lambda_2=\text{3.116280e+04}$ is so large, if that is reasonable? Well, we cannot judge that, since you did'nt tell us about the marginal distribution of your $Y$ variable! But, you told us that $Y=\text{annual foreign sales of companies (in US\$ thousands)}$ and I would guess that contains many large values, so in that context maybe the estimated $\lambda_2$ is not so large.
Best Answer
If you are looking to maximize instead of minimizing a function $f$, you can call nlminb on $-f$, since maximizing $f$ is equivalent to minimizing $-f$.
Other possible optimization functions are
nlm
andoptim
. See this answer for a comparison of these functions.