Solved – Nonlinear regression: best transformation when getting very different parameter estimates

Disclaimer: Statistics is not my strong side, so if my question is nonsense I apologize. I'm a beginner, but really wanting to understand this.

My question is: why do I get so widely different parameter estimates when using different transformations on my data in a non-linear regression ?

I'm trying to do a nonlinear regression and to estimate the uncertainty of the fit (confidence interval) using linear approximation. From my understanding the more linear-like the shape of the nonlinear function, the more accurate will the confidence interval calculation by linear approximation be. I therefore want to transform the data to make it as linear as possible. The errors in $y$ can be assumed to be log-normal. My data is monotonic and assumed to follow a power function in most cases.

$$ y = a*(x-x_0)^b $$

where $y$ is river discharge, $x$ is an arbitrary water level in the river and $x_0$ is the water level where where discharge $y$ is 0. This can be rewritten as log transformed, and nice and linear
$$ log(y) = a + b \times log(x-x_0) $$.

I need to estimate the parameters $a$, $b$ and $x_0$, so to do so simultaneously I use nonlinear regression. I also have some data that follows quadratic functions, so I would like to set up (and understand) a non-linear method.

I use r and nlsLM() from minpack.lm to carry out the non-linear regression.
Here is some example code:

library(minpack.lm)

xdata <- c(19,  21,  24,    25, 29, 34, 35, 40, 40, 46, 48, 48, 52, 56, 57, 65, 65, 68)
ydata <- c(10,  11, 14, 20, 24, 50, 42, 96, 89, 134,    135,    161,    171,    218,    261,    371,    347,    393)
df<-data.frame(x=xdata, y=ydata)

#weights applied in the case of no transformation (relative error assumed to be the same for all y data)
W<-1/ydata

# NLS regression with weights, no transformation
nlsmodel1<-nlsLM(y ~ a*(x-x0)^b,data=df,start=list(a=0.1, b=2.5,x0=0))

# log transformed
nlsmodel2<-nlsLM(log(y) ~ a+b*(log(x-x0)),data=df,start=list(a=0.1, b=2.5,x0=0))
> coef(nlsmodel1)
          a           b          x0 
0.005158377 2.719693093 4.896772931 
> coef(nlsmodel2)
        a         b        x0 
-8.683758  3.445699 -4.139127 

> exp(-8.683758)
[1] 0.0001693136

I understand that the weights are very important, and can have a say in the differences here, but not by this much? My judgement of the two parameter sets is that nlsmodel1 performs "better", and that the b coefficient is too high in the fit from nlsmodel2. nlsmodel2 does a poor job in the upper end of the data, with large residuals there. But why are they so different? I feel like I'm doing something very silly here, and is unable to see the error. I have tried some other transformations, for example only transforming LHS as log(y), but the problem remains.

I appreciate any tips that can help me improve, and not the least understand, the transformed fit.

Cheers

Best Answer

One note about weights: It is better to weight by the predicted value of Y (for given X) rather than the observed value of Y. The observed values can vary for random reasons, and this can (in some cases) throw the weights way off, and lead to suboptimal results.

I don't know enough about R to know if it can handle weighting by the reciprocal of the predicted Y value squared (rather than the reciprocal of the observed Y value squared). You could fit with the free demo of GraphPad Prism (which weights by predicted Y) and see how different the results are. It depends, of course, on how scattered your data are -- how far apart the predicted and observed Y values tend to be.

Best Answer

Related Solutions

Solved – Back-transformation and interpretation of $\log(X+1)$ estimates in multiple linear regression

Solved – compare models from linear regression and nonlinear regression using RMSE

Related Question