Solved – Nonlinear regression / Curve fitting with L-infinity norm

curve fittingnonlinear regressionr

I am looking into time series data compression at the moment.

The idea is to fit a curve on a time series of n points so that the maximum deviation of any of the points is not greater than a given threshold. In other words, none of the values that the curve takes at the points where the time series is defined should be "further away" than a certain threshold from the actual values.

Till now I have found out how to do nonlinear regression using the least squares estimation method in R (nls function) and other languages, but I haven't found any packages that implement nonlinear regression with the L-infinity norm.

I have found papers on "Non-linear curve fitting in the $L_1$ and $L_{\infty}$ norms", by Shrager and Hill and "A linear programming algorithm for curve fitting in the $L_{\infty}$ norm", by Armstrong and Sklar.

I could try to implement this in R for instance, but I first looking to see if this hasn't already been done and that I could maybe reuse it.

I have found a solution that I don't believe to be "very scientific": I use nonlinear least squares regression to find the starting values of the parameters which I subsequently use as starting points in the R optim function that minimizes the maximum deviation of the curve from the actual points.

The idea is to be able to find out if this type of curve-fitting is possible on a given time series sequence and to determine the parameters that allow it.

Best Answer

For those who may be interested, I have found a paper that proposes a solution to my problem: "A min-max algorithm for non-linear regression models", by A. Tishler and I. Zang.

I have tested it myself, and I get the results I need.

Related Solutions

Solved – “Least square root” fitting? A fitting method with multiple minima

It seems to me you are actually looking for a regression models with re-descending loss function ("far away points are weighted less than close ones") loss function. Such loss functions --for example the Tukey biweight-- lead to highly non-convex optimization problems, meaning that there are, potentially, a finite but factorial-order increasing number of potential solution. Clearly, which ever one you will pick up will depend on your "starting weights" (you initial guess about which observations belong to the model you are looking for).

R has some facilities to fit such models. An example is the lmrob function in the robustbase package. For illustrative reasons I use a univariate example below but in principle the logic extends to higher dimensions.

library(robustbase)
#constructing the data --one group at a time
x1<-cbind(1,rnorm(100,0,5))
x2<-cbind(1,rnorm(100,0,5))
b1<-c(-3,2)
b2<-c(4,-2)
y1<-x1%*%b1+rnorm(100)
y2<-x2%*%b2+rnorm(100)
#mixing the data
x<-rbind(x1,x2)
y<-c(y1,y2)
sam1<-sample(1:100,3)  
bet1<-lm(y[sam1]~x[sam1,]-1)
ini1<-list(coefficients=as.numeric(coef(bet1)),scale=sd(bet1$resid))
ctrl<-lmrob.control()
mod1<-lmrob(y~x-1,init=ini1)

plot(x[,2],y,pch=16)
points(x[sam1,2],y[sam1],col="red",pch=16)
abline(mod1$coef,col="red")

the red points mark the initial guess of $d+1$ points from the same relationship. The red line is the final fitted regression function

Now, as I mentioned, fitting these types of model is actually a highly non-convex problem. The final solutions will depend on the quality of your starting points. In the example above, my "guess" that the original points (marked in red in the plot above) belonged to the same relationship was correct. In general, specially in higher dimensions, this is a highly questionable assumption. None-withstanding this, if your initial guess is correct, the biweight loss function enabled an appreciable gain over the naive solution (the OLS line passing by the 3 initial points).

EDIT:

I forgot to mention that you have a parameter that determines the width of the area for which the bi-weight gives non-zero weights. This is the tuning.psi parameter of the lmrob.control object.

There is a tradeoff between efficiency and the width of the biweight: the wider the biweight, the more points you effectively use, the more efficiency you get. A limiting case is a biweight that never re-descends (maximal width) which would give you the OLS solution. Adjusting the width is done by:

ctrl$tuning.psi<-1.35
mod1<-lmrob(y~x-1,init=ini1,control=ctrl)

If you set the ctrl$tuning.psi too small, you will get convergence problems. This can be solved by increasing the max.it value of the control object:

ctrl$max.it<-500

There is a whole theory on optimal values of the tunning constant, but it only applies if the sub-population you are targeting has more than half of the observations. I gather this is not the case you are concerned with. If this is the case, I think it is best is to play with to get a handle on it.

Solved – Difference between regression analysis and curve fitting

I doubt that there is a clear and consistent distinction across statistically minded sciences and fields between regression and curve-fitting.

Regression without qualification implies linear regression and least-squares estimation. That doesn't rule out other or broader senses: indeed once you allow logit, Poisson, negative binomial regression, etc., etc. it gets harder to see what modelling is not regression in some sense.

Curve-fitting does literally suggest a curve that can be drawn on a plane or at least in a low-dimensional space. Regression is not so bounded and can predict surfaces in a several dimensional space.

Curve-fitting may or may not use linear regression and/or least squares. It might refer to fitting a polynomial (power series) or a set of sine and cosine terms or in some other way actually qualify as linear regression in the key sense of fitting a functional form linear in the parameters. Indeed curve-fitting when nonlinear regression is regression too.

The term curve-fitting could be used in a disparaging, derogatory, deprecatory or dismissive sense ("that's just curve fitting!") or (almost the complete opposite) it might refer to fitting a specific curve carefully chosen with specific physical (biological, economic, whatever) rationale or tailored to match particular kinds of initial or limiting behaviour (e.g. being always positive, bounded in one or both directions, monotone, with an inflexion, with a single turning point, oscillatory, etc.).

One of several fuzzy issues here is that the same functional form can be at best empirical in some circumstances and excellent theory in others. Newton taught that trajectories of projectiles can be parabolic, and so naturally fitted by quadratics, whereas a quadratic fitted to age dependency in the social sciences is often just a fudge that matches some curvature in the data. Exponential decay is a really good approximation for radioactive isotopes and a sometimes not too crazy guess for the way that land values decline with distance from a centre.

Your example gets no explicit guesses from me. Much of the point here is that with a very small set of data and precisely no information on what the variables are or how they are expected to behave it could be irresponsible or foolish to suggest a model form. Perhaps the data should rise sharply from (0, 0) and then approach (1, 1), or perhaps something else. You tell us!

Note. Neither regression nor curve-fitting is limited to single predictors or single parameters (coefficients).

Best Answer

Related Solutions

Solved – “Least square root” fitting? A fitting method with multiple minima

EDIT:

Solved – Difference between regression analysis and curve fitting

Related Question