Both packages, alabama and Rsolnp, contain "[i]mplementations of the augmented lagrange multiplier method for general nonlinear optimization" --- as the optimization task view says --- and are quite reliable and robust. The can handle equality and inequality constraints defined as (nonlinear) functions again.
I have worked with both packages. Sometimes, constraints are a bit easier to formulate with Rsolnp, whereas alabama appears to be a bit faster at times.
There is also the package Rdonlp2 that relies on an external and in the optimization community well-known software library. Unfortunately, its license status is a bit uncertain at the moment.
As I've said in the comment, this is not a constrained optimisation problem, i.e. there are no constraints.
Define
$$\psi(x)=\begin{cases}T^2,|x|>T\\x^2,|x|\le T\end{cases}$$
Then the optimisation problem can be rewritten as
$$w^*=\text{argmin}_w\sum_{i=1}^n\psi\left(y_i-\sum_{j=1}^nx_{ij}w_j\right)$$
Such types of minimisation problems were first considered by P. J. Huber in 1964. This particular problem then is a robust statistics problem. More complicated example is least trimmed squares, where the portion of largest in absolute value errors are discarded. The latter method is implemented in R package robustbase with function lmrob
.
For your particular problem you can use optim
. Here is the example (in R). First setup dummy data and functions:
psi <- function(x,T) {
x[abs(x)>T]<-T
x^2
}
optfun <- function(w,T) {
sum(psi(y-X%*%w,T=T))
}
##Create sample data set
set.seed(13)
X <- cbind(1,rnorm(100))
y <- 1+ 0.5*X[,2] + rnorm(100)/2
##Contaminate data
smpl <- sample(1:100,10)
X[smpl,2] <- 10
Now run the optimisation:
> optim(c(0,0),optfun,T=2)
$par
[1] 0.95419635 0.03152925
$value
[1] 48.39998
$counts
function gradient
77 NA
$convergence
[1] 0
$message
NULL
See the slope is far from the true value. Reduce the value of $T$:
> optim(c(0,0),optfun,T=1)
$par
[1] 1.0072015 0.5557612
$value
[1] 31.67965
$counts
function gradient
63 NA
$convergence
[1] 0
$message
NULL
Now we get the true values. In this simple example I ignored all the possible optimisation issues, i.e. the choice of starting values and choice of optimisation method. It is not hard to extend this code to produce standard errors of the coefficient estimates, see package numDeriv.
Best Answer
So I found a solution that works. This might be relevant for others who find the question in the future.
Here cat1 and cat2 are the sums which are requried from the matrices.
The code that works: