Solved – Least squares with non-linear constraints

least squaresnonlinearoptimization

I have the following problem:

I want to minimize a least square problem with non-linear restrictions. The start model has the following form:

$$w^*=\text{min arg}_w \sum_{i=1}^{N} (y_i-\sum_{j=1}^3 w_jx_{ij})^2$$
where $w \in R.$

I minimized error at above model using least-square method.
My model is now more complicated. I will describe the new complication. Let $T>0$ be a threshold which is given as input.
All the absolute differences which are not in the interval $[-T,T]$ introduce the error $T^2$.

The model becomes:

$$w^*=\text{arg min}_w\sum_{i=1}^N S_i(y_i-\sum_{j=1}^3 w_jx_{ij})^2+\sum_{i=1}^N T^2 (1-S_i)$$

where $\forall i \in 1,\dots, N$ we have the following constraints:

$$S_i(y_i-\sum_{j=1}^3 w_jx_{ij})^2<=T^2,$$
where $
S_{i} \in \{0,1\}
$.

$S_i$ is not known at input and must be determined.
How can I solve this optimization problem?

Best Answer

As I've said in the comment, this is not a constrained optimisation problem, i.e. there are no constraints.

Define

$$\psi(x)=\begin{cases}T^2,|x|>T\\x^2,|x|\le T\end{cases}$$

Then the optimisation problem can be rewritten as

$$w^*=\text{argmin}_w\sum_{i=1}^n\psi\left(y_i-\sum_{j=1}^nx_{ij}w_j\right)$$

Such types of minimisation problems were first considered by P. J. Huber in 1964. This particular problem then is a robust statistics problem. More complicated example is least trimmed squares, where the portion of largest in absolute value errors are discarded. The latter method is implemented in R package robustbase with function lmrob.

For your particular problem you can use optim. Here is the example (in R). First setup dummy data and functions:

 psi <- function(x,T) { 
   x[abs(x)>T]<-T
   x^2
 }

 optfun <- function(w,T) {
     sum(psi(y-X%*%w,T=T))
 }

 ##Create sample data set    
 set.seed(13)
 X <- cbind(1,rnorm(100))
 y <- 1+ 0.5*X[,2] + rnorm(100)/2

 ##Contaminate data 
 smpl <- sample(1:100,10)
 X[smpl,2] <- 10

Now run the optimisation:

> optim(c(0,0),optfun,T=2)
$par
[1] 0.95419635 0.03152925

$value
[1] 48.39998

$counts
function gradient 
      77       NA 

$convergence
[1] 0

$message
NULL

See the slope is far from the true value. Reduce the value of $T$:

> optim(c(0,0),optfun,T=1)
$par
[1] 1.0072015 0.5557612

$value
[1] 31.67965

$counts
function gradient 
      63       NA 

$convergence
[1] 0

$message
NULL

Now we get the true values. In this simple example I ignored all the possible optimisation issues, i.e. the choice of starting values and choice of optimisation method. It is not hard to extend this code to produce standard errors of the coefficient estimates, see package numDeriv.

Related Question