Solved – Using nls() function in R for exponential function

nlsrtime series

I know that this issue was already discussed here but I faced with the problem I can't solve. I have list of persons, each represented with some time series consisting from 4-8 points. I want to approximate them all with the function $y=a\cdot x^2\cdot exp(-bx)+c$.
Thus for each person I am going to find his own "a", "b" and "c".
For most of them next code works very good:

res=nls(p2[,2] ~ c+a*I(p2[,1]^2)*exp(b*p2[,1]),start=list(a=0.005,b=-0.005,c=5))

However for some persons these starting values don't work, R returned "Missing value or an infinity produced when evaluating the model" or "singular gradient matrix at initial parameter estimates". For some of these people these starting values worked:

res=nls(p2[,2] ~ c+a*I(p2[,1]^2)*exp(b*p2[,1]),start=list(a=0.1,b=-0.02,c=5))

Could anybody give any clear suggestion how to choose starting points for all the people I consider?
I tried to use tryCatch to try different staring values and find those which work but another problem appeared:
this code
nls(p2[,2] ~ c+a*I(p2[,1]^2)*exp(b*p2[,1]),start=list(a=5,b=0,c=5))
led to:

        a         b         c 
 -0.00166  -0.00269 140.87366 

while
nls(p2[,2] ~ c+a*I(p2[,1]^2)*exp(b*p2[,1]),start=list(a=0.1,b=-0.02,c=5))
led to

      a       b       c 
 0.2024 -0.0251 47.7811 

So by choosing different starting values we have different answers. How can this happen? I thought that since NLS function is quadratic, it can't have more than 1 extremum…
Do you have any suggestions about how should I proceed in this situation?

Best Answer

Non-linear least squares solves $min_\beta \sum (y_i-f(x_i;\beta))^2$. This is quadratic in $\beta$ if $f$ is linear in $\beta$. Your $f$ is not linear in $\beta$, so the NLS objective function is not quadratic in $\beta$. Of course, you don't need the function to be quadratic to guarantee convergence to a unique minimum, rather you need $min_\beta \sum (y_i-f(x_i;\beta))^2$ to be convex in $\beta$. Presumably, with your $f$, the NLS objective function is not convex. It doesn't look, to me, like the kind of $f$ which generates a convex objective function. That's pretty much the explanation. You can have lots of minima or one minimum.

If I were fitting the function that you are, I would use an entirely different approach. I would not just blindly use NLS. If you look carefully at your function, $f(x_i;\beta)=a*x_i^2exp(-bx_i)+c$ it is almost linear in the parameters. If you fixed $b$ at some value, say 0.1, then you could fit $a$ and $c$ by OLS: \begin{align} y_i &= a*x_i^2exp(-0.1x_i)+c \\ &= a*z_i+c \end{align} The variable $z_i$ is defined $z_i=x_i^2exp(-0.1x_i)$. This means that, once you have picked $b$, the optimal value of $a=\widehat{Cov}(y,z)/\hat{V}(z)$ and the optimal value of $c=\overline{y}-a*\overline{z}$.

So what, right? At the very least, this is how you should pick starting values for $a$ and $c$. But, really, this reduces the search for optimal parameters to a one dimensional search over $b$. With a modern computer, one dimensional searches are fast and easy. If you have some idea of what reasonable values for $b$ are, then you can just define an interval $[b_{low},b_{high}]$ and grid search for the b which gives the lowest sum of squared errors. Then use that $b$ and its associated optimal $a$ and $c$ to start NLS from.

Or, you could do something more sophisticated. Suppose you are searching over $b$, using the optimal $a(b)$ and $c(b)$ from OLS. Then the NLS objective function is $\sum \left(y_i - f(x_i;a(b),b,c(b))\right)^2$. The envelope theorem makes the derivative of this very easy to calculate: \begin{align} \frac{d}{d b} \sum \left(y_i - f(x_i;\beta)\right)^2 &= \sum 2\left(y_i - f(x_i;\beta)\right)\frac{d}{d b}f(x_i;\beta)\\ &= \sum 2\left(y_i - f(x_i;\beta)\right)(-abx_i^2exp(-bx_i)) \end{align}

So, you can easily write a function to calculate the NLS objective function for any given $b$ and you can easily write a function to calculate the derivative of the NLS objective function for any $b$. These two ingredients are enough to get a optimizer going on your function. Then, after you find the optimal $b$, just run NLS with that $b$ and its associated optimal $a$ and $c$. It will converge in one iteration.