I've got bitten by this recently. My intentions were the same, make some artificial model and test it. The main reason is the one given by @whuber and @marco. Such model is not identified. To see that, remember that NLS minimizes the function:
$$\sum_{i=1}^n(y_i-a-br^{x_i-m}-cx_i)^2$$
Say it is minimized by the set of parameters $(a,b,m,r,c)$. It is not hard to see that the the set of parameters $(a,br^{-m},0,r,c)$ will give the same value of the function to be minimized. Hence the model is not identified, i.e. there is no unique solution.
It is also not hard to see why the gradient is singular. Denote
$$f(a,b,r,m,c,x)=a+br^{x-m}+cx$$
Then
$$\frac{\partial f}{\partial b}=r^{x-m}$$
$$\frac{\partial f}{\partial m}=-b\ln rr^{x-m}$$
and we get that for all $x$
$$b\ln r\frac{\partial f}{\partial b}+\frac{\partial f}{\partial m}=0.$$
Hence the matrix
\begin{align}
\begin{pmatrix}
\nabla f(x_1)\\\\
\vdots\\\\
\nabla f(x_n)
\end{pmatrix}
\end{align}
will not be of full rank and this is why nls
will give the the singular gradient message.
I've spent over a week looking for bugs in my code elsewhere till I noticed that the main bug was in the model :)
Non-linear least squares solves $min_\beta \sum (y_i-f(x_i;\beta))^2$. This is quadratic in $\beta$ if $f$ is linear in $\beta$. Your $f$ is not linear in $\beta$, so the NLS objective function is not quadratic in $\beta$. Of course, you don't need the function to be quadratic to guarantee convergence to a unique minimum, rather you need $min_\beta \sum (y_i-f(x_i;\beta))^2$ to be convex in $\beta$. Presumably, with your $f$, the NLS objective function is not convex. It doesn't look, to me, like the kind of $f$ which generates a convex objective function. That's pretty much the explanation. You can have lots of minima or one minimum.
If I were fitting the function that you are, I would use an entirely different approach. I would not just blindly use NLS. If you look carefully at your function, $f(x_i;\beta)=a*x_i^2exp(-bx_i)+c$ it is almost linear in the parameters. If you fixed $b$ at some value, say 0.1, then you could fit $a$ and $c$ by OLS:
\begin{align}
y_i &= a*x_i^2exp(-0.1x_i)+c \\
&= a*z_i+c
\end{align}
The variable $z_i$ is defined $z_i=x_i^2exp(-0.1x_i)$. This means that, once you have picked $b$, the optimal value of $a=\widehat{Cov}(y,z)/\hat{V}(z)$ and the optimal value of $c=\overline{y}-a*\overline{z}$.
So what, right? At the very least, this is how you should pick starting values for $a$ and $c$. But, really, this reduces the search for optimal parameters to a one dimensional search over $b$. With a modern computer, one dimensional searches are fast and easy. If you have some idea of what reasonable values for $b$ are, then you can just define an interval $[b_{low},b_{high}]$ and grid search for the b which gives the lowest sum of squared errors. Then use that $b$ and its associated optimal $a$ and $c$ to start NLS from.
Or, you could do something more sophisticated. Suppose you are searching over $b$, using the optimal $a(b)$ and $c(b)$ from OLS. Then the NLS objective function is $\sum \left(y_i - f(x_i;a(b),b,c(b))\right)^2$. The envelope theorem makes the derivative of this very easy to calculate:
\begin{align}
\frac{d}{d b} \sum \left(y_i - f(x_i;\beta)\right)^2 &= \sum 2\left(y_i - f(x_i;\beta)\right)\frac{d}{d b}f(x_i;\beta)\\
&= \sum 2\left(y_i - f(x_i;\beta)\right)(-abx_i^2exp(-bx_i))
\end{align}
So, you can easily write a function to calculate the NLS objective function for any given $b$ and you can easily write a function to calculate the derivative of the NLS objective function for any $b$. These two ingredients are enough to get a optimizer going on your function. Then, after you find the optimal $b$, just run NLS with that $b$ and its associated optimal $a$ and $c$. It will converge in one iteration.
Best Answer
You can get successful convergence by using the Golub-Pereyra algorithm for partially linear least-squares models (which is a more sophisticated way of doing what @whuber suggested):
That looks somewhat reasonable but summary output shows that none of the parameters is significant (note that
.lin1
is the parameter c and.lin2
is a).If you use starting values close to this solution, the default Gauss-Newton algorithm also converges.
You have strong collinearity, which means there are more parameters in your model than your data supports:
You should try and collect more and better data if you need to fit this model (but that might not be possible).